scylladb

Author	SHA1	Message	Date
Kefu Chai	2583a025fc	s3/test: collect log on exit the temporary directory holding the log file collecting the scylla subprocess's output is specified by the test itself, and it is `test_tempdir`. but unfortunately, cql-pytest/run.py is not aware of this. so `cleanup_all()` is not able to print out the logging messages at exit. as, please note, cql-pytest/run.py always collect "log" file under the directory created using `pid_to_dir()` where pid is the spawned subprocesses. but `object_store/run` uses the main process's pid for its reusable tempdir. so, with this change, we also register a cleanup func to printout the logging message when the test exits. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13647	2023-04-24 13:53:25 +03:00
Pavel Emelyanov	28a01c9e60	Merge 'test: object_store: fix various pylint warnings' from Kefu Chai when reading this source code, there are a handful issues reported by my flycheck plugin. none of them is critical, but better off fixing them. Closes #13612 * github.com:scylladb/scylladb: test: object_store: specify timeout test: object_store: s/exit/sys.exit/ test: object_store: do not declare a global variable for read test: object_store: remove unused imports	2023-04-24 13:45:01 +03:00
Benny Halevy	87d9c4d7f8	sstables: filesystem_storage::change_state: simplify log message When moving to the base directory, the printout currently looks broken: ``` INFO 2023-04-16 09:15:58,631 [shard 0] sstable - Moving sstable .../data/ks/cf-4c1bb670dc3711ed96733daf102e4aab/upload/md-1-big-Data.db to in ".../data/ks/cf-4c1bb670dc3711ed96733daf102e4aab/" ``` Since `path` already contains `to`, the message can be just simplified and `to` need not be printed explicitly. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #13525	2023-04-24 13:43:48 +03:00
Kefu Chai	4f21755c98	timeout_config: correct the misconfigured {truncate, other}_timeout this change fixes the regression introduced by `ebf5e138e8`, which * initialized `truncate_timeout_in_ms` with `counter_write_request_timeout_in_ms`, * returns `cas_timeout_in_ms` in the place of `other_timeout_in_ms`. in this change, these two misconfigurations are fixed. Fixes #13633 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13639	2023-04-24 12:26:14 +03:00
Kefu Chai	2c91728d8a	auth: do not include unused header in `5a9b4c02e3`, the iostream based formatter was dropped, there is no need to include `<iostream>` or `<iosfwd>` in these source files anymore. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13643	2023-04-24 12:24:29 +03:00
Kefu Chai	642854f36f	test: s/os.P_NOWAIT/os.WNOHANG/ `os.P_NOWAIT` is supposed to be used in spawn calls, while `os.WNOHANG` is used as in the options parameter passed to wait calls. fortunately, `P_NOWAIT` is defined as "1" in CPython, and `os.WNOHANG` is defined as "1" in linux kernel. that's why the existing implementation works. but we should not rely on this coincidence. so, in this change, `os.P_NOWAIT` is replaced with `os.WNOHANG` for correctness and for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13646	2023-04-24 11:42:34 +03:00
Kefu Chai	a573a89128	keys: print "non-utf8-key" when clustering_key is not UTF-8 before this change we do not check if the clustering_key to be formatted is UTF-8 encoded before printing it. but we do perform the validation when printing paritition_keys. since the clustering_key is not different from partition_key when it comes to encoding, actually they are different parts of a parimary key. so let's validate the encoding of clustering_key as well, when formatting it. this change is a follow-up of `85b21ba049`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13641	2023-04-24 10:40:23 +03:00
Botond Dénes	864d27f9af	Merge 'clear_gently: handle null unique_ptr and optional values' from Benny Halevy This series adds handling of null std::unique_ptr to utils::clear_gently and handling of std::optional and seastar::optimized_optional (both engaged and disengaged cases). Also, unit tests were added to tests the above cases. Fixes #13636 Closes #13638 * github.com:scylladb/scylladb: utils: clear_gently: add variants for optional values utils: clear_gently: do not clear null unique_ptr	2023-04-24 10:27:32 +03:00
Kefu Chai	c06b20431e	cdc: generation: use default-generated operator== now that C++20 generates operator== for us, these is no need to handcraft it manually. also, in C++17, the standard library offers default implementation of operator== for `std::variant<>`, so no need to implement it by ourselves. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13625	2023-04-24 10:13:28 +03:00
Botond Dénes	2d8d8043be	Merge 'Coroutinize system_keyspace::get_compaction_history' from Pavel Emelyanov Closes #13620 * github.com:scylladb/scylladb: system_keyspace: Fix indentation after previous patch system_keyspace: Coroutinize get_compaction_history()	2023-04-24 09:48:01 +03:00
Botond Dénes	9e757d9c6d	Merge 'De-globalize storage proxy' from Pavel Emelyanov All users of global proxy are gone (), proxy can be made fully main/cql_test_env local. () one test case still needs it, but can get it via cql_test_env Closes #13616 * github.com:scylladb/scylladb: code: Remove global proxy schema_change_test: Use proxy from cql_test_env test: Carry proxy reference on cql_test_env	2023-04-24 09:38:00 +03:00
Botond Dénes	1750bb34b7	Merge 'sstables, replica: add generation generator' from Kefu Chai this is the first step to the uuid-based generation identifier. the goal is to encapsulate the generation related logic in generator, so its consumers do not have to understand the difference between the int64_t based generation and UUID v1 based generation. this commit should not change the behavior of existing scylla. it just allows us to derive from `generation_generator` so we can have another generator which generates UUID based generation identifier. Closes #13073 * github.com:scylladb/scylladb: replica, test: create generation id using generator sstables: add generation_generator test: sstables: use generate_n for generating ids for testing	2023-04-24 09:31:08 +03:00
Botond Dénes	85abece927	Merge 'Restrict logging of current_backtrace to log_level' from Benny Halevy `seastar::current_backtrace()` can be quite heavey. When we pass it to a log message in relatively detailed log_level (debug/trace), we pay the price of `current_backtrace` every time, but we rarely print the message. Closes #13527 * github.com:scylladb/scylladb: locator/topology: call seastar::current_backtrace only when log_level is enabled schema_tables: call seastar::current_backtrace only when log_level is enabled	2023-04-24 08:50:32 +03:00
Botond Dénes	7f04d8231d	Merge 'gms: define and use generation and version types' from Benny Halevy This series cleans up the generation and value types used in gms / gossiper. Currently we use a blend of int, int32_t, and int64_t around messaging. This change defines gms::generation_type and gms::version_type as int32_t and add check in non-release modes that the respective int64 value passed over messaging do not overflow 32 bits. Closes #12966 * github.com:scylladb/scylladb: gossiper: version_generator: add {debug_,}validate_gossip_generation gms: gossip_digest: use generation_type and version_type gms: heart_beat_state: use generation_type and version_type gms: versioned_value: use version_type gms: version_generator: define version_type and generation_type strong types utils: move generation-number to gms utils: add tagged_integer gms: versioned_value: make members private scylla-gdb: add get_gms_versioned_value gms: versioned_value: delete unused compare_to function gms: gossip_digest: delete unused compare_to function	2023-04-24 08:44:48 +03:00
Maxim Korolyov	002bdd7ae7	doc: add jaeger integration docs Closes #13490	2023-04-24 08:26:53 +03:00
Chang Chen Chien	c25a718008	docs: fix typo in using-scylla/local-secondary-indexes.rst Closes #13607	2023-04-24 06:56:19 +03:00
Benny Halevy	002865018f	utils: clear_gently: add variants for optional values Implement clear_gently for std:;optional<T> and seastar::optimized_optional<T> and respective unit tests. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 21:34:02 +03:00
Benny Halevy	12877ad026	utils: clear_gently: do not clear null unique_ptr Otherwise the null pointer is dereferenced. Add a unit test reproducing the issue and testing this fix. Fixes #13636 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 21:33:11 +03:00
Pavel Emelyanov	5e201b9120	database: Remove compaction_manager.hh inclusion into database.hh The only reason why it's there (right next to compaction_fwd.hh) is because the database::table_truncate_state subclass needs the definition of compaction_manager::compaction_reenabler subclass. However, the former sub is not used outside of database.cc and can be defined in .cc. Keeping it outside of the header allows dropping the compaction_manager.hh from database.hh thus greatly reducing its fanout over the code (from ~180 indirect inclusions down to ~20). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13622	2023-04-23 16:27:11 +03:00
Benny Halevy	5520d3a8e3	gossiper: version_generator: add {debug_,}validate_gossip_generation Make sure that the int64_t generation we get over rpc fits in the int32_t generation_type we keep locally. Restrict this assertion to non-release builds. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:48:01 +03:00
Benny Halevy	5dc7b7811c	gms: gossip_digest: use generation_type and version_type Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:48:01 +03:00
Benny Halevy	4cdad8bc8b	gms: heart_beat_state: use generation_type and version_type Define default constructor as heart_beat_state(gms::generation_type(0)) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:48:01 +03:00
Benny Halevy	b638571cb0	gms: versioned_value: use version_type Adjust scylla-gdb.get_gms_version_value to get the versioned_value version as version_type (utils::tagged_integer). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:48:01 +03:00
Benny Halevy	2d20ee7d61	gms: version_generator: define version_type and generation_type strong types Derived from utils::tagged_integer, using different tags, the types are incompatible with each other and require explicit typecasting to- and from- their value type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:47:17 +03:00
Benny Halevy	d1817e9e1b	utils: move generation-number to gms Although get_generation_number implementation is completely generic, it is used exclusively to seed the gossip generation number. Following patches will define a strong gms::generation_id type and this function should return it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Benny Halevy	f5f566bdd8	utils: add tagged_integer A generic template for defining strongly typed integer types. Use it here to replace raft::internal::tagged_uint64. Will be used for defining gms generation and version as strong and distinguishable types in following patches. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Benny Halevy	c5d819ce60	gms: versioned_value: make members private and provide accessor functions to get them. 1. So they can't be modified by mistake, as the versioned value is immutable. A new value must have a higher version. 2. Before making the version a strong gms::version_type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Benny Halevy	5aaec73612	scylla-gdb: add get_gms_versioned_value Prepare for next patch that makes gms::versioned_value members private, and provides methods by the same name as the current members. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Benny Halevy	44a8db016a	gms: versioned_value: delete unused compare_to function Not only it is unused, it is wrong since it doesn't compare the value, only its version. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Benny Halevy	59e771be5c	gms: gossip_digest: delete unused compare_to function Not only it is unused, it is wrong since it doesn't compare the digest endpoint member. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Kefu Chai	c2488fc516	test: object_store: specify timeout just in case scylla does not behave as expected, so we can identify the issue and error out sooner without hang forever until the whole test timesout. this issue was identified by pylint, see https://pylint.readthedocs.io/en/latest/user_guide/messages/warning/missing-timeout.html Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-22 00:38:37 +08:00
Tomasz Grabiec	bd0b299322	Merge 'Manage CDC generations when bootstrapping nodes using Raft Group 0 topology coordinator' from Kamil Braun Introduce a new table `CDC_GENERATIONS_V3` (`system.cdc_generations_v3`). The table schema is a copy-paste of the `CDC_GENERATIONS_V2` schema. The difference is that V2 lives in `system_distributed_keyspace` and writes to it are distributed using regular `storage_proxy` replication mechanisms based on the token ring. The V3 table lives in `system_keyspace` and any mutations written to it will go through group 0. Extend the `TOPOLOGY` schema with new columns: - `new_cdc_generation_data_uuid` will be stored as part of a bootstrapping node's `ring_slice`, it stores UUID of a newly introduced CDC generation which is used as partition key for the `CDC_GENERATIONS_V3` table to access this new generation's data. It's a regular column, meaning that every row (corresponding to a node) will have its own. - `current_cdc_generation_uuid` and `current_cdc_generation_timestamp` together form the ID of the newest CDC generation in the cluster. (the uuid is the data key for `CDC_GENERATIONS_V3`, the timestamp is when the CDC generation starts operating). Those are static columns since there's a single newest CDC generation. When topology coordinator handles a request for node to join, calculate a new CDC generation using the bootstrapping node's tokens, translate it to mutation format, and insert this mutation to the CDC_GENERATIONS_V3 table through group 0 at the same time we assign tokens to the node in Raft topology. The partition key for this data is stored in the bootstrapping node's `ring_slice`. After inserting new CDC generation data , we need to pick a timestamp for this generation and commit it, telling all nodes in the cluster to start using the generation for CDC log writes once their clocks cross that timestamp. We introduce a separate step to the bootstrap saga, before `write_both_read_old`, called `commit_cdc_generation`. In this step, the coordinator takes the `new_cdc_generation_data_uuid` stored in a bootstrapping node's `ring_slice` - which serves as the key to the table where the CDC generation data is stored - and combines it with a timestamp which it generates a bit into the future (as in old gossiper-based code, we use 2 * ring_delay, by default 1 minute). This gives us a CDC generation ID which we commit into the topology state as the `current_cdc_generation_id` while switching the saga to the next step, `write_both_read_old`. Once a new CDC generation is committed to the cluster by the topology coordinator, we also need to publish it to the user-facing description tables so CDC applications know which streams to read from. This uses regular distributed table writes underneath (tables living in the `system_distributed` keyspace) so it requires `token_metadata` to be nonempty. We need a hack for the case of bootstrapping the first node in the cluster - turning the tokens into normal tokens earlier in the procedure in `token_metadata`, but this is fine for the single-node case since no streaming is happening. When a node notices that a new CDC generation was introduced in `storage_service::topology_state_load`, it updates its internal data structures that are used when coordinating writes to CDC log tables. We include the current CDC generation data in topology snapshot transfers. Some fixes and refactors included. Closes #13385 * github.com:scylladb/scylladb: docs: cdc: describe generation changes using group 0 topology coordinator cdc: generation_service: add a FIXME cdc: generation_service: add legacy_ prefix for gossiper-based functions storage_service: include current CDC generation data in topology snapshots db: system_keyspace: introduce `query_mutations` with range/slice storage_service: hold group 0 apply mutex when reading topology snapshot service: raft_group0_client: introduce `hold_read_apply_mutex` storage_service: use CDC generations introduced by Raft topology raft topology: publish new CDC generation to the user description tables raft topology: commit a new CDC generation on node bootstrap raft topology: create new CDC generation data during node bootstrap service: topology_state_machine: make topology::find const db: system_keyspace: small refactor of `load_topology_state` cdc: generation: extract pure parts of `make_new_generation` outside db: system_keyspace: add storage for CDC generations managed by group 0 service: topology_state_machine: better error checking for state name (de)serialization service: raft: plumbing `cdc::generation_service&` cdc: generation: `get_cdc_generation_mutations`: take timestamp as parameter cdc: generation: make `topology_description_generator::get_sharding_info` a parameter sys_dist_ks: make `get_cdc_generation_mutations` public sys_dist_ks: move find_schema outside `get_cdc_generation_mutations` sys_dist_ks: move mutation size threshold calculation outside `get_cdc_generation_mutations` service/raft: group0_state_machine: signal topology state machine in `load_snapshot`	2023-04-21 18:11:27 +02:00
Kefu Chai	f85da1bd30	test: object_store: s/exit/sys.exit/ the former is expected to be used in an interactive session, not in an application. see also: https://docs.python.org/3/library/constants.html#constants-added-by-the-site-module and https://docs.python.org/3/library/sys.html#sys.exit Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-21 23:25:59 +08:00
Kefu Chai	c7b62fbf81	test: object_store: do not declare a global variable for read we only need to declare a variable with `global` when we need to write to it, but if we just want to read it, there is no need to declare it. because the way how python looks up for a variable when reading from it enables python to find the global variables (and apparently the functions!). but when we assign a variable in python, the interpreter would have to tell in which scope the variable lives. by default the local scope is used, and a new variable is added to `locals()`. but in this case, we just read from it. so no need to add the `global` statement. see also https://docs.python.org/3/reference/simple_stmts.html#global Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-21 23:25:59 +08:00
Kefu Chai	4989a59a0b	test: object_store: remove unused imports Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-21 23:25:59 +08:00
Pavel Emelyanov	2aabaada9e	system_keyspace: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-21 17:32:57 +03:00
Pavel Emelyanov	6290849f11	system_keyspace: Coroutinize get_compaction_history() In order not to copy the rvalue consumer arg -- instantly convert it into value. No other tricks. Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-21 17:32:02 +03:00
Kefu Chai	576adbdbc5	replica, test: create generation id using generator reuse generation_generator for generating generation identifiers for less repeatings. also, add allow update generator to update its lastest known generation id. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-21 22:02:30 +08:00
Kefu Chai	6e82aa42d5	sstables: add generation_generator to prepare for the uuid-based generation identifier, where we will generate uuid-based generation idenfier if corresponding option is enabled, otherwise an integer based id. to reduce the repeatings, generation_generator is extracted out so it can be reused. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-21 21:51:13 +08:00
Anna Stuchlik	a68b976c91	doc: document `tombstone_gc` as not experimental The tombstone_gc was documented as experimental in version 5.0. It is no longer experimental in version 5.2. This commit updates the information about the option. Closes #13469	2023-04-21 14:43:25 +02:00
Botond Dénes	fcd7f6ac5f	Update tools/java submodule * tools/java c9be8583...eb3c43f8 (1): > Use EstimatedHistogram in metricPercentilesAsArray	2023-04-21 14:31:38 +03:00
Kefu Chai	a2aa133822	treewide: use std::lexicographical_compare_threeway this the standard library offers `std::lexicographical_compare_threeway()`, and we never uses the last two addition parameters which are not provided by `std::lexicographical_compare_threeway()`. there is no need to have the homebrew version of trichotomic compare function. in this change, * all occurrences of `lexicographical_tri_compare()` are replaced with `std::lexicographical_compare_threeway()`. * ``lexicographical_tri_compare()` is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13615	2023-04-21 14:28:18 +03:00
Kefu Chai	51fc0bc698	sstables: use default generated operator== C++20 compiler is able to generate defaulted operator== and operator!=. and the default generated operators behaves exactly the same as the ones crafted by us. so let's it do its job. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13614	2023-04-21 14:25:39 +03:00
Pavel Emelyanov	739455c3aa	code: Remove global proxy No code needs global proxy anymore. Keep on-stack values in main and cql_test_env and keep the pointer on debug:: namespace. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-21 14:18:59 +03:00
Pavel Emelyanov	f953fb2f52	schema_change_test: Use proxy from cql_test_env There's one place where test case calls for storage proxy and currently does it via global refernece. Time to switch it to cql_test_env's one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-21 14:18:00 +03:00
Pavel Emelyanov	681a19f54c	test: Carry proxy reference on cql_test_env All sharded<> services are created by cql_test_env on the stack. The cql_test_env() is then used to keep references on some of them and to export them to test cases via its methods. Proxy is missing on that exportable list, but will be needed, so add one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-21 14:16:54 +03:00
Botond Dénes	10c1f1dc80	Merge 'db: system_keyspace: use microsecond resolution for group0_history range tombstone' from Kamil Braun in `make_group0_history_state_id_mutation`, when adding a new entry to the group 0 history table, if the parameter `gc_older_than` is engaged, we create a range tombstone in the mutation which deletes entries older than the new one by `gc_older_than`. In particular if `gc_older_than = 0`, we want to delete all older entries. There was a subtle bug there: we were using millisecond resolution when generating the tombstone, while the provided state IDs used microsecond resolution. On a super fast machine it could happen that we managed to perform two schema changes in a single millisecond; this happened sometimes in `group0_test.test_group0_history_clearing_old_entries` on our new CI/promotion machines, causing the test to fail because the tombstone didn't clear the entry correspodning to the previous schema change when performing the next schema change (since they happened in the same millisecond). Use microsecond resolution to fix that. The consecutive state IDs used in group 0 mutations are guaranteed to be strictly monotonic at microsecond resolution (see `generate_group0_state_id` in service/raft/raft_group0_client.cc). Fixes #13594 Closes #13604 * github.com:scylladb/scylladb: db: system_keyspace: use microsecond resolution for group0_history range tombstone utils: UUID_gen: accept decimicroseconds in min_time_UUID	2023-04-21 14:08:56 +03:00
Kamil Braun	55f43e532c	Merge 'get rid of gms/failure_detector' from Benny Halevy Move gms::arrival_window to api/failure_detector which is its only user. and get rid of the rest, which is not used, now that we use direct_failure_detector instead. TODO: integare direct_failure_detector with failure_detector api. Closes #13576 * github.com:scylladb/scylladb: gms: get rid of unused failure_detector api: failure_detector: remove false dependency on failure_detector::arrival_window test: rest_api: add test_failure_detector	2023-04-21 11:47:44 +02:00
Kamil Braun	f7408130c9	Merge 'Fix topology management when raft-based topology is enabled' from Tomasz Grabiec Fixes a problem when raft-based topology is enabled, which loads topology from storage. It starts by clearing topology and then adding nodes one by one. Before this patch, this violates internal invariant of topology object which puts the local node as the first node. This would manifest by triggering an assert in topology::pop_node() which throws if popping the node at index 0 in order to keep the information about local node around. This is normally prevented by a check in topology::remove_node() which avoid calling pop_node() if removing the local node. But since there is no node which is marked as local, this check allows the first node to be popped. To fix the problem I lift the invariant that local node is always in _nodes. We still have information about local node in config. Instead of keeping it in _nodes, we recognize it as part of indexing. We also allow removing the local node like a regular node. The path which reloads topology works correctly after this, the local node will be recognized when (if) it is added to the topology. Fixes #13495 Closes #13498 * github.com:scylladb/scylladb: locator: topology: Fix move assignment locator: topology: Add printer tests: topology: Test that topology clearing preserves information about local node locator: topology: Recognize local node as part of indexing it locator: topology: Fix get_location(ep) for local node locator: topology: Fix typo locator: topology: Preserve config when cloning	2023-04-21 11:45:08 +02:00
Alejo Sanchez	ce87aedd30	test: topology smp test with custom cluster Instead of decommission of initial cluster, use custom cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13589	2023-04-21 10:43:54 +02:00
Kamil Braun	f9d8118c8d	db: system_keyspace: use microsecond resolution for group0_history range tombstone in `make_group0_history_state_id_mutation`, when adding a new entry to the group 0 history table, if the parameter `gc_older_than` is engaged, we create a range tombstone in the mutation which deletes entries older than the new one by `gc_older_than`. In particular if `gc_older_than = 0`, we want to delete all older entries. There was a subtle bug there: we were using millisecond resolution when generating the tombstone, while the provided state IDs used microsecond resolution. On a super fast machine it could happen that we managed to perform two schema changes in a single millisecond; this happened sometimes in `group0_test.test_group0_history_clearing_old_entries` on our new CI/promotion machines, causing the test to fail because the tombstone didn't clear the entry correspodning to the previous schema change when performing the next schema change (since they happened in the same millisecond). Use microsecond resolution to fix that. The consecutive state IDs used in group 0 mutations are guaranteed to be strictly monotonic at microsecond resolution (see `generate_group0_state_id` in service/raft/raft_group0_client.cc). Fixes #13594	2023-04-21 10:33:05 +02:00
Kamil Braun	218a056825	utils: UUID_gen: accept decimicroseconds in min_time_UUID The function now accepts higher-resolution duration types, such as microsecond resolution timestamps. Will be used by the next commit.	2023-04-21 10:33:02 +02:00
Kefu Chai	b0ef053552	test: sstables: use generate_n for generating ids for testing so we don't need to keep a `prev_gen` around, this also prepares for the coming change to use generation generator. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-21 15:45:16 +08:00
Kefu Chai	ca6ebbd1f0	cql3, db: sstable: specialize fmt::formatter<function_name> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `function_name` without the help of `operator<<`. the corresponding `operator<<()` are dropped dropped in this change, as all its callers are now using fmtlib for formatting now. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13608	2023-04-21 10:07:28 +03:00
Botond Dénes	d74f3598f4	Merge 'dht: specialize fmt::formatter<dht::token>' from Kefu Chai this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `dht::token` without the help of `operator<<`. the corresponding `operator<<()` is preserved in this change, as it has lots of users in this project, we will tackle them case-by-case in follow-up changes. also, the forward declaration of `operator<<(ostream&, constdht::token&)` in `dht/i_partitioner.hh` is removed. ias it not necessary. Refs https://github.com/scylladb/scylladb/issues/13245 Closes #13610 * github.com:scylladb/scylladb: dht: remove unnecessarily forward declaration dht: specialize fmt::formatter<dht::token>	2023-04-21 09:51:25 +03:00
Kefu Chai	c5fa1ac9f7	sstable: specialize fmt::formatter<component_type> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `component_type` without the help of `operator<<`. the corresponding `operator<<()` are dropped dropped in this change, as all its callers are now using fmtlib for formatting now. also, please note, to enable fmtlib to format `std::set<component_type>` in `test/boost/sstable_3_x_test.cc` , we need to include `<fmt/ranges.h>` in that source file. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13598	2023-04-21 09:49:24 +03:00
Kefu Chai	9215adee46	streaming: specialize fmt::formatter<stream_reason> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `stream_reason` without the help of `operator<<`. please note, because we still cannot use the generic formatter for std::unordered_map provided by fmtlib, so in order to drop `operator<<` for `stream_reason`, and to print `unordered_map<stream_reason>`, `fmt::join()` is used as a temporary solution. we will audit all `fmt::join()` calls, after removing the homebrew formatter of `std::unordered_map`. the corresponding `operator<<()` are dropped dropped in this change, as all its callers are now using fmtlib for formatting now. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13609	2023-04-21 09:44:23 +03:00
Kefu Chai	ecb5380638	treewide: s/boost::lexical_cast<std::string>/fmt::to_string()/ this change replaces all occurrences of `boost::lexical_cast<std::string>` in the source tree with `fmt::to_string()`. for couple reasons: * `boost::lexical_cast<std::string>` is longer than `fmt::to_string()`, so the latter is easier to parse and read. * `boost::lexical_cast<std::string>` creates a stringstream under the hood, so it can use the `operator<<` to stringify the given object. but stringstream is known to be less performant than fmtlib. * we are migrating to fmtlib based formatting, see #13245. so using `fmt::to_string()` helps us to remove yet another dependency on `operator<<`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13611	2023-04-21 09:43:53 +03:00
Benny Halevy	3f1ac846d8	gms: get rid of unused failure_detector The legacy failure_detector is now unused and can be removed. TODO: integare direct_failure_detector with failure_detector api. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-21 09:08:27 +03:00
Benny Halevy	d546b92685	api: failure_detector: remove false dependency on failure_detector::arrival_window Up until `0ef33b71ba` get_endpoint_phi_values retrieved arrival samples from gms::get_arrival_samples(). That function was removed since it returned a constant ampty map. This patch returns empty results without relying on failure_detector::arrival_window, so the latter can be retired altogether. As Tomasz Grabiec <tgrabiec@scylladb.com> said: > I don't think the logic of arrival_window belongs to api, > it belongs to the failure detector. If there is no longers > a failure detector, there should be no arrival_window. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-21 09:08:25 +03:00
Benny Halevy	35de60670c	test: rest_api: add test_failure_detector Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-21 09:06:15 +03:00
Nadav Har'El	9c3907bb3c	test/cql-pytest: reproducers for incorrect AVG of "decimal" type This patch contains tests reproducing issue #13601 and the corresponding Cassandra issue CASSANDRA-18470. These issues are about what the AVG aggregation does for arbitrary-precision "decimal" numbers - the tests we add here show examples where the current behavior doesn't make sense: The problem is that "decimal" has arbitrary precision - so, should an average of 1/3 be returned as 0.3 or 0.33333333333333333? This is not specified, so Scylla (and Cassandra) decided to pick the result precision based on the input precision. In particular, the average of 1 and 2 is returned as 2 (zero digits after the decimal point, like in the inputs) instead of the expected 1.5. Arguably this isn't useful behavior. The test adds a second test which fails on Cassandra, but does pass on Scylla: Cassandra returns as the average of 1, 2, 2, 3 the integer 1 whereas the correct average is 2 (and Scylla returns it correctly). The reason why this bug is even worse on Cassandra is that Scylla's AVG only loses precision when dividing the sum and count, but Cassandra tries to maintain only the average, and loses precision at every step. Refs #13601 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13603	2023-04-21 08:32:30 +03:00
Kefu Chai	7b21bfd36e	mutation: specialize fmt::formatter<apply_resume> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `apply_resume` without the help of `operator<<`. the corresponding `operator<<()` are dropped dropped in this change, as all its callers are now using fmtlib for formatting now. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13584	2023-04-21 08:27:57 +03:00
Benny Halevy	77b70dbdb7	sstables: compressed_file_data_source_impl: get: throw malformed_sstable_exception on premature eof Currently, the reader might dereference a null pointer if the input stream reaches eof prematurely, and read_exactly returns an empty temporary_buffer. Detect this condition before dereferencing the buffer and sstables::malformed_sstable_exception. Fixes #13599 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #13600	2023-04-21 07:56:58 +03:00
Botond Dénes	d828cfcb23	Merge 'db, cql3: functions: switch argument passing to std::span' from Avi Kivity Database functions currently receive their arguments as an std::vector. This is inflexible (for example, one cannot use small_vector to reduce allocations). This series adapts the function signature to accept parameters using std::span. Some changes in the keys interface are needed to support this. Lastly, one call site is migrated to small_vector. This is in support of changing selectors to use expressions. Closes #13581 * github.com:scylladb/scylladb: cql3: abstract_function_selector: use small_vector for argument buffer db, cql3: functions: pass function parameters as a span instead of a vector keys: change from_optional_exploded to accept a span instead of a vector	2023-04-21 06:49:07 +03:00
Kefu Chai	fe9f41bd84	dht: remove unnecessarily forward declaration it turns out the declaration of `operator<<(ostream&, const dht::token&)` is unnecessarily. so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-21 11:41:54 +08:00
Kefu Chai	53dedca8cd	dht: specialize fmt::formatter<dht::token> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `dht::token` without the help of `operator<<`. the corresponding `operator<<()` is preserved in this change, as it has lots of users in this project, we will tackle them case-by-case in follow-up changes. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-21 11:41:54 +08:00
Avi Kivity	0c64dd12b1	test: raft_server_test: fix string compare for clang 15 Clang 15 rejects string compares where the left-hand-side is a C string, so help it along by converting it ourselves. Closes #13582	2023-04-21 06:38:10 +03:00
Tomasz Grabiec	0ec700cd00	locator: topology: Fix move assignment Defaulted assignment doesn't update node::_topology.	2023-04-20 23:39:18 +02:00
Tomasz Grabiec	6ed841b8d7	locator: topology: Add printer	2023-04-20 23:39:18 +02:00
Tomasz Grabiec	3dfd49fe62	tests: topology: Test that topology clearing preserves information about local node	2023-04-20 23:39:18 +02:00
Tomasz Grabiec	7d3384089a	locator: topology: Recognize local node as part of indexing it Fixes a problem when raft-based topology is enabled, which loads topology from storage. It starts by clearing topology and then adding nodes one by one. Before this patch, this violates internal invariant of topology object which puts the local node as the first node. This would manifest by triggering an assert in topology::pop_node() which throws if popping the node at index 0 in order to keep the information about local node around. This is normally prevented by a check in topology::remove_node() which avoid calling pop_node() if removing the local node. But since there is no node which is marked as local, this check allows the first node to be popped. To fix the problem I lift the invariant that local node is always in _nodes. We still have information about local node in config. Instead of keeping it in _nodes, we recognize it as part of indexing. We also allow removing the local node like a regular node. The path which reloads topology works correctly after this, the local node will be recognized when (if) it is added to the topology. Fixes #13495	2023-04-20 23:39:18 +02:00
Tomasz Grabiec	eb9d6df8bf	locator: topology: Fix get_location(ep) for local node topology config may designate a different node than get_broadcast_address() as local node. In particular, some tests don't designate any node as the local node, which leads to logic errors where current get_location(ep) for ep which happens to have the address 127.0.0.1 returns location of the first node in _nodes rather than ep. Fix by looking up in _nodes first and fall back to local node if it's equal to configured local node (if any).	2023-04-20 23:39:18 +02:00
Tomasz Grabiec	0a675291dd	locator: topology: Fix typo	2023-04-20 23:39:18 +02:00
Tomasz Grabiec	0b1dfb2683	locator: topology: Preserve config when cloning Config is separate from state of the topology (nodes it contains). Preserving the config will make it easier in later patches to maintain invariants for cloned instances.	2023-04-20 23:39:18 +02:00
Botond Dénes	1426c623eb	Merge 'Tune up S3 unit tests environment usage (and a bit more)' from Pavel Emelyanov The tests in question are using MINIO_SERVER_ADDRESS environment variable to export minio server address from pylib to test cases. Also they use hard-coded public bucket name. Both plays badly with AWS S3, the former due to MINIO_... in its name and the latter because public bucket name can be any. So this PR puts address and public bucket name into S3_..._FOR_TEST environment variables and fixes output stream closure on failure while at it. Detached from #13493 Closes #13546 * github.com:scylladb/scylladb: s3/test: Rename MINIO_SERVER_ADDRESS environment variable s3/test: Keep public bucket name in environment s3/test: Fix upload stream closure test/lib: Add getenv_safe() helper	2023-04-20 18:01:12 +03:00
Kamil Braun	88aff50e8b	docs: cdc: describe generation changes using group 0 topology coordinator Update the `Generation switching` section: most of the existing description landed in `Gossiper-based topology changes` subsection, and a new subsection was added to describe Raft group 0 based topology changes. Marked as WIP - we expect further development in this area soon. The existing gossiper-based description was also updated a bit.	2023-04-20 16:36:41 +02:00
Kamil Braun	1688001585	cdc: generation_service: add a FIXME	2023-04-20 16:36:41 +02:00
Kamil Braun	d13a0b1930	cdc: generation_service: add legacy_ prefix for gossiper-based functions Most of the code in the service exists to handle gossiper-based topology changes. Name the functions appropriately and add a note in the comments.	2023-04-20 16:36:41 +02:00
Kamil Braun	8afb15700b	storage_service: include current CDC generation data in topology snapshots Note that we don't need to include earlier CDC generations, just the current (i.e. latest) one. We might observe a problem when nodes are being bootstrapped in quick succession - I left a FIXME describing the problem and possible solutions.	2023-04-20 16:36:41 +02:00
Kamil Braun	3d96bc5dba	db: system_keyspace: introduce `query_mutations` with range/slice There is a `query_mutations` function which loads the entire contents of a given table into memory. There was no function for e.g. loading just a single partition in the form of mutations. Introduce one.	2023-04-20 16:36:41 +02:00
Kamil Braun	3b26135227	storage_service: hold group 0 apply mutex when reading topology snapshot This is a bugfix: we need to hold the mutex when loading topology data from tables, otherwise they might be concurrently modified by `group0_state_machine::apply` and the snapshot that we send won't make any sense. Also specify in comments that the lock must be held during `topology_transition`, `topology_state_load`, `merge_topology_snapshot`.	2023-04-20 16:36:41 +02:00
Kamil Braun	f081de7cc5	service: raft_group0_client: introduce `hold_read_apply_mutex` We'll use it in `storage_service` topology snapshot request handler.	2023-04-20 16:36:41 +02:00
Kamil Braun	4c99b4004b	storage_service: use CDC generations introduced by Raft topology When a node notices that a new CDC generation was introduced in `storage_service::topology_state_load`, it updates its internal data structures that are used when coordinating writes to CDC log tables.	2023-04-20 16:36:41 +02:00
Kamil Braun	5f2b297f99	raft topology: publish new CDC generation to the user description tables Once a new CDC generation is committed to the cluster by the topology coordinator, we also need to publish it to the user-facing description tables so CDC applications know which streams to read from. This uses regular distributed table writes underneath (tables living in the `system_distributed` keyspace) so it requires `token_metadata` to be nonempty. We need a hack for the case of bootstrapping the first node in the cluster - turning the tokens into normal tokens earlier in the procedure in `token_metadata`, but this is fine for the single-node case since no streaming is happening.	2023-04-20 16:36:41 +02:00
Kamil Braun	58baf998c1	raft topology: commit a new CDC generation on node bootstrap After inserting new CDC generation data (see previous commit), we need to pick a timestamp for this generation and commit it, telling all nodes in the cluster to start using the generation for CDC log writes once their clocks cross that timestamp. We introduce a separate step to the bootstrap saga, before `write_both_read_old`, called `commit_cdc_generation`. In this step, the coordinator takes the `new_cdc_generation_data_uuid` stored in a bootstrapping node's `ring_slice` - which serves as the key to the table where the CDC generation data is stored - and combines it with a timestamp which it generates a bit into the future (as in old gossiper-based code, we use 2 * ring_delay, by default 1 minute). This gives us a CDC generation ID which we commit into the topology state as the `current_cdc_generation_id` while switching the saga to the next step, `write_both_read_old`. `system_keyspace::load_topology_state` is extended to load `current_cdc_generation_id`. For now, nodes don't react to `current_cdc_generation_id`. In later commit we'll extend `storage_service::topology_state_load` to start using the current CDC generation for CDC log table writes. The solution with specifying a timestamp into the future is the same as it is for gossip-based topology changes and it has the same consistency problem - if some node is temporarily partitioned away from the quorum, it might not learn about the new CDC generation before its clock crosses the generation's timestamp, causing it to temporarily send writes to the wrong CDC streams (until it learns about the new timestamp). I left a FIXME which describes an alternative solution which wasn't viable for gossiper-based topology changes, but it is viable when we have a fault-tolerant topology coordinator.	2023-04-20 16:36:41 +02:00
Kamil Braun	5942237a79	raft topology: create new CDC generation data during node bootstrap Calculate a new CDC generation using the bootstrapping node's tokens, translate it to mutation format, and insert this mutation to the CDC_GENERATIONS_V3 table through group 0 at the same time we assign tokens to the node in Raft topology. The partition key for this data is stored in the bootstrapping node's `ring_slice`. The data is inserted, but it's not used for anything yet, we'll do it in later commits. Two FIXMEs are left for follow-ups: - in `get_sharding_info` we shouldn't have to use the token owner's IP, but get the host ID directly from token metadata (#12279), - splitting the CDC generation data write into multiple commands. The comment elaborates.	2023-04-20 16:35:37 +02:00
Pavel Emelyanov	30b6f34a0b	s3/client: Explicitly set _upload_id empty when completing The upload_sink::_upload_id remains empty until upload starts, remains non-empty while it proceeds, then becomes empty again after it completes. The upload_started() method cheks that and on .close() started upload is aborted. The final switch to empty is done by std::move()ing the upload id into completion requrest, but it's better to use std::exchange() to emphasize the fact the the _upload_id becomes empty at that point for a reason. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13570	2023-04-20 17:32:08 +03:00
Kamil Braun	4e7628fa16	service: topology_state_machine: make topology::find const	2023-04-20 16:16:36 +02:00
Kamil Braun	22094f1509	db: system_keyspace: small refactor of `load_topology_state` The variables necessary for constructing a `ring_slice` are now living in a local block of code. This makes it easier to see which data is part of the `ring_slice` and will make it easier to add more data to `ring_slice` in following commits. Also add some more sanity checking.	2023-04-20 15:40:23 +02:00
Avi Kivity	1cd6d59578	Merge 'Remove global proxy usage from view_info::select_statement()' from Pavel Emelyanov The method needs proxy to get data_dictionary::database from to pass down to select_statement::prepare(). And a legacy bit that can come with data_dictionary::database as well. Fortunately, all the call traces that end up at select_statement() start inside table:: methods that have view_update_generator, or at view_builder::consumer that has reference to view_builder. Both services can share the database reference. However, the call traces in question pass through several code layers, so the PR adds data_dictionary::database to those layers one by one. Closes #13591 * github.com:scylladb/scylladb: view_info: Drop calls to get_local_storage_proxy() view_info: Add data_dictionary argument to select_statement() view_info: Add data_dictionary argument to partition_slice() method view_filter_checking_visitor: Construct with data_dictionary view: Carry data_dictionary arg through standalone helpers view_updates: Carry data_dictionary argument throug methods view_update_builder: Construct with data dictionary table: Push view_update_generator arg to affected_views() view: Add database getters to v._update_generator and v._builder	2023-04-20 16:40:06 +03:00
Kamil Braun	3abe0f0ad6	cdc: generation: extract pure parts of `make_new_generation` outside `cdc::generation_service::make_new_cdc_generation` would create a new CDC generation and insert it into the `CDC_GENERATIONS_V2` table these days. For Raft-based topology chnages we'll do the data insertion somewhere else - in topology coordinator code. So extract the parts for calculating the CDC generation to free-standing functions (these are almost pure calculations, modulo accessing RNG).	2023-04-20 15:38:59 +02:00
Kamil Braun	2233d8f54d	db: system_keyspace: add storage for CDC generations managed by group 0 The `CDC_GENERATIONS_V3` table schema is a copy-paste of the `CDC_GENERATIONS_V2` schema. The difference is that V2 lives in `system_distributed_keyspace` and writes to it are distributed using regular `storage_proxy` replication mechanisms based on the token ring. The V3 table lives in `system_keyspace` and any mutations written to it will go through group 0. Also extend the `TOPOLOGY` schema with new columns: - `new_cdc_generation_data_uuid` will be stored as part of a bootstrapping node's `ring_slice`, it stores UUID of a newly introduced CDC generation which is used as partition key for the `CDC_GENERATIONS_V3` table to access this new generation's data. It's a regular column, meaning that every row (corresponding to a node) will have its own. - `current_cdc_generation_uuid` and `current_cdc_generation_timestamp` together form the ID of the newest CDC generation in the cluster. (the uuid is the data key for `CDC_GENERATIONS_V3`, the timestamp is when the CDC generation starts operating). Those are static columns since there's a single newest CDC generation.	2023-04-20 15:38:58 +02:00
Kamil Braun	07382d634a	service: topology_state_machine: better error checking for state name (de)serialization For example: ``` std::ostream& operator<<(std::ostream& os, ring_slice::replication_state s) { os << replication_state_to_name_map[s]; return os; } ``` this would print an empty string if the state was missing from `replication_state_to_name_map` (because `operator[]` default-construct a value if it's missing). Use `find` instead and make it an error if the state is missing. Also turn `throw std::runtime_error` into `on_internal_error` in deserialization functions because failure to deserialize a state name is an internal error, not user error.	2023-04-20 15:38:37 +02:00
Kamil Braun	59b692e799	service: raft: plumbing `cdc::generation_service&` Pass a reference to the service into places. It shall be used later, by the group 0 state machine and topology coordinator.	2023-04-20 15:38:37 +02:00
Kamil Braun	1e9cf3badd	cdc: generation: `get_cdc_generation_mutations`: take timestamp as parameter The function would generate a mutation timestamp for itself, take it as parameter instead. We'll use timestamps provided by Group 0 APIs when creating CDC generations during Group 0- based topology changes.	2023-04-20 15:38:37 +02:00
Kamil Braun	85f4f1830b	cdc: generation: make `topology_description_generator::get_sharding_info` a parameter The function used to obtain the sharding info for a given node (its number of shards and ignore_msb_bits) was using gossiper application states. We want to reuse `topology_description_generator` to build CDC generations when doing Raft Group 0-based topology changes, so make `get_sharding_info` a parameter.	2023-04-20 15:38:37 +02:00
Kamil Braun	3e863d0e58	sys_dist_ks: make `get_cdc_generation_mutations` public It was a `static` function inside system_distributed_keyspace. Later it will be used for another table living in system_keyspace, so move it outside, to the CDC generations module, and make it accessible from other places.	2023-04-20 15:38:37 +02:00
Kamil Braun	ed133db709	sys_dist_ks: move find_schema outside `get_cdc_generation_mutations` The function will be reused for a different table.	2023-04-20 15:38:37 +02:00
Kamil Braun	0e84662910	sys_dist_ks: move mutation size threshold calculation outside `get_cdc_generation_mutations` The function turns a `cdc::topology_description` into a vector of mutations. It decides when to push_back a new mutation (instead of extending an existing one) based on certain parameters. This calculation is specific to where we insert the mutation later. Move the calculation outside, to the function which does the insertion. `get_cdc_generation_mutations` will be used outside this function later.	2023-04-20 15:38:37 +02:00
Kamil Braun	52366f33e5	service/raft: group0_state_machine: signal topology state machine in `load_snapshot` The `_topology_state_machine.event` condition variable should be signalled whenever the topology state is updated, including on snapshot load.	2023-04-20 15:38:37 +02:00
Avi Kivity	43a0b40082	Merge 'Remove global proxy usage from API handlers' from Pavel Emelyanov There are few places in the API handlers that call global proxy for their needs. Most of those places are easy to patch, because proxy is either at http_ctx thing right inside the handler code. Also there's a handler code in view_builder that needs proxy too, but it really needs topology, not proxy, and can get it elsewhere (the handler is coroutinized while at it) Closes #13593 * github.com:scylladb/scylladb: view: Get topology via database tokens view: Indentation fix after previous patch view: Coroutinuze view_builder::view_build_statuses() api: Use ctx.sp in storage service handler api,main: Unset storage_proxy API on stop api: Use ctx.sp in set_storage_proxy() routes	2023-04-20 16:31:31 +03:00
Botond Dénes	66ee73641e	test/cql-pytest/nodetool.py: no_autocompaction_context: use the correct API This `with` context is supposed to disable, then re-enable autocompaction for the given keyspaces, but it used the wrong API for it, it used the column_family/autocompaction API, which operates on column families, not keyspaces. This oversight led to a silent failure because the code didn't check the result of the request. Both are fixed in this patch: * switch to use `storage_service/auto_compaction/{keyspace}` endpoint * check the result of the API calls and report errors as exceptions Fixes: #13553 Closes #13568	2023-04-20 16:21:16 +03:00
Kamil Braun	8d7b5f1710	Merge 'test/pylib: topology fix asyncio fixture and fix logger' from Alecco Remove unnecessary asyncio marker and re-introduce top level logger instance. Closes #13561 * github.com:scylladb/scylladb: test/pylib: add missing logger test/pylib: remove unnecessary asyncio marker	2023-04-20 14:23:05 +02:00
Alejo Sanchez	11561a73cb	test/pylib: ManagerClient helpers to wait for... server to see other servers after start/restart When starting/restarting a server, provide a way to wait for the server to see at least n other servers. Also leave the implementation methods available for manual use and update previous tests, one to wait for a specific server to be seen, and one to wait for a specific server to not be seen (down). Fixes #13147 Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13438	2023-04-20 14:22:31 +02:00
Avi Kivity	342cdb2a63	Update tools/jmx submodule (split Depends line) * tools/jmx 15fd4ca...fdd0474 (1): > dist/debian: split Depends into multiple lines	2023-04-20 15:11:33 +03:00
Pavel Emelyanov	bda2aea5be	view: Get topology via database tokens The view_builder::view_build_statuses() needs topology to walk its nodes. Now it gets one from global proxy via its token metadata, but database also has tokens and view_builder has reference to database. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 13:18:14 +03:00
Pavel Emelyanov	403463d7eb	view: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 13:18:14 +03:00
Pavel Emelyanov	257814f443	view: Coroutinuze view_builder::view_build_statuses() Easier to patch it this way further. Indentation is deliberately left broken until next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 13:17:59 +03:00
Pavel Emelyanov	ece731301c	api: Use ctx.sp in storage service handler Similarly to previous patch, but from another routes group. The storage service API calls mainly use storage service, but one place needs proxy to call recalculate_schema_version() with Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 13:14:52 +03:00
Pavel Emelyanov	21136058bd	api,main: Unset storage_proxy API on stop So that the routes referencing and using ctx.sp don't step on a proxy that's going to be removed (not now, but some time later) fron under them on shutdown. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 13:14:04 +03:00
Pavel Emelyanov	8d490d20dc	api: Use ctx.sp in set_storage_proxy() routes It's already used in many other places, few methods still stick to global proxy usage. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 13:12:49 +03:00
Alejo Sanchez	2c1ba377bf	test/pylib: add missing logger The logger instancewas removed in a previous commit but it is used in the wrapper helper. Add it back. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-04-20 10:36:02 +02:00
Alejo Sanchez	05338a6cd7	test/pylib: remove unnecessary asyncio marker Remove missing asyncio marker for fixture as this is only needed for tests. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-04-20 10:36:02 +02:00
Pavel Emelyanov	edcce7d8dd	view_info: Drop calls to get_local_storage_proxy() In both cases the proxy is called to get data_dictionary from. Now its available as the call argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:46 +03:00
Pavel Emelyanov	3e4fb7cad6	view_info: Add data_dictionary argument to select_statement() This method needs data_dictionary to work. Fortunately, all callers of it already have the dictionary at hand and can just pass it as argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:46 +03:00
Pavel Emelyanov	4375835cdd	view_info: Add data_dictionary argument to partition_slice() method The caller is calculate_affected_clustering_ranges() with dictionary arg, the method needs dictionary to call view_info::select_statement() later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:46 +03:00
Pavel Emelyanov	0aff55cdb2	view_filter_checking_visitor: Construct with data_dictionary The visitor is wait-free helper for matches_view_filter() that has dictionary as its argument. Later the visitor will pass the dictionary to view_info::select_statement(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:46 +03:00
Pavel Emelyanov	837fde84b1	view: Carry data_dictionary arg through standalone helpers There's a bunch of functions in view.{hh\|cc} that don't belong to any class and perform view-related claculations for view updates. Lots of them eventually call view_info::select_statement() which will later need the dictionary. By now all those methods' callers have data dictionary at hand and can share it via argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:46 +03:00
Pavel Emelyanov	1301a99ba3	view_updates: Carry data_dictionary argument throug methods The goal is to have the dictionary at places that later wrap calls to view_info::select_statement(). This graph of calls starts at the only public view_updates::generate_update() method which, in turn, is called from view_update_builder that already has data dictionary at hand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:46 +03:00
Pavel Emelyanov	9d3d533561	view_update_builder: Construct with data dictionary The caller is table with view-update-generator at hand (it calls mutate_MV on). Builder here is used as a temporary object that destroys once the caller coroutine co_return-s, so keeping the database obtained from the view-update-generator is safe. Later the v.u.b. object will propagate its data dictionary down the callstacks. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:38 +03:00
Pavel Emelyanov	4a16ab3bd4	table: Push view_update_generator arg to affected_views() Caller already has it to call mutate_MV() on. The method in question will need the generator in one of the next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 10:42:31 +03:00
Pavel Emelyanov	7ddcd0c918	view: Add database getters to v._update_generator and v._builder Both services carry database which will be used by auxiliary objects like view_updates, view_update_builder, consumer, etc in next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 10:41:16 +03:00
Warren Krewenki	73eaebe338	Remove visible :orphan: The text `:orphan:` was showing up in the scylla.yaml documentation with no context. Closes #13524	2023-04-20 08:24:48 +03:00
Avi Kivity	9fb5443f87	cql3: abstract_function_selector: use small_vector for argument buffer abstract_function_selector uses a preallocated vector to store the arguments to aggregate functions, to prevent an allocation for every row. Use small_vector to prevent an allocation per query, if the number of arguments happens to be small. This isn't expected to make a significant performance difference.	2023-04-19 20:42:25 +03:00
Avi Kivity	3e0aacc8b5	db, cql3: functions: pass function parameters as a span instead of a vector Spans are more flexible and can be constructed from any contiguous container (such as small_vector), or a subrange of such a container. This can save allocations, so change the signature to accept a span. Spans cannot be constructed from std::initializer_list, so one such call site is changed to use construct a span directly from the single argument.	2023-04-19 20:38:55 +03:00
Avi Kivity	9072763a52	keys: change from_optional_exploded to accept a span instead of a vector A span is more generic than a vector, and can be constructed from any contiguous container (like small_vector), or a subset of a container. To support this, helpers in compound.hh need to use make_iterator_range, since a span doesn't fit the container concept (since spans don't own their contents). This is needed to make a similar change to function evaluation, as the token function passes its parameters to from_optional_exploded().	2023-04-19 20:18:50 +03:00
Avi Kivity	6ca1b14488	Update tools/jmx submodule (drop java 8 on debian) * tools/jmx 3316f7a...15fd4ca (1): > dist/debian: drop dependencies on jdk-8	2023-04-19 19:51:03 +03:00
Botond Dénes	0c430c01e9	Merge 'cql: allow SUM() aggregations which result in a NaN' from Nadav Har'El This short PR fixes a bug in SUM() aggregation where if the data contains +Inf and -Inf the returned sum should be NaN but we returned an error instead. This is a recent regression uncovered by a dtest (see issue #13551), but in the first patch we add additional tests in the cql-pytest framework which reproduce this bug and explore various other areas (wrongly) implicated by the failing dtest. Fixes #13551 Closes #13564 * github.com:scylladb/scylladb: cql3: allow SUM() aggregation to result in a NaN test/cql-pytest: add tests for data casts and inf in sums	2023-04-19 13:50:23 +03:00
Pavel Emelyanov	a77ca69360	s3/test: Rename MINIO_SERVER_ADDRESS environment variable Using it the pylib minio code export minio address for tests. This creates unneeded WTFs when running the test over AWS S3, so it's better to rename to variable not to mention MINIO at all. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-19 12:51:12 +03:00
Pavel Emelyanov	12c4e7d605	s3/test: Keep public bucket name in environment Local test.py runs minio with the public 'testbucket' bucket and all test cases know that. This series adds an ability to run tests over real S3 so the bucket name should be configurable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-19 12:51:12 +03:00
Pavel Emelyanov	91674da982	s3/test: Fix upload stream closure If multipart upload fails for some reason the output stream remains not closed and the respective assertion masquerades the original failure. Fix that by closing the stream in all cases. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-19 12:51:12 +03:00
Pavel Emelyanov	b239e0d368	test/lib: Add getenv_safe() helper The helper is like ::getenv() but checks if the variable exists and throws descriptive exception. So instead of fatal error: in "...": std::logic_error: basic_string: construction from null is not valid one could get something like fatal error: in "...": std::logic_error: Environment variable ... not set Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-19 12:49:26 +03:00
Botond Dénes	ad065aaa62	Update tools/jmx submodule * tools/jmx e9bfaabd...3316f7a9 (2): > select-java: avoid exec multiple paths > select-java: extract function out	2023-04-19 11:18:19 +03:00
Nadav Har'El	81e0f5b581	cql3: allow SUM() aggregation to result in a NaN When floating-point data contains +Inf and -Inf, the sum is NaN. Our SUM() aggregation calculated this sum correctly, but then instead of returning it, complained that the sum overflowed by narrowing. This was a false positive: The sum() finalizer wanted to test that no precision was lost when casting the accumulator to the result type, so checked that the result before and after the cast are the same. But specifically for NaN, it is never equal to anything - not even to itself. This check is wrong for floating point, but moreover - isn't even necessary when the two types (accumulator type and result type) are identical so in this patch we skip it in this case. Note that in the current code, a different accumulator and result type is only used in the case of integer types; When accumulating floating point sums, the same type is used, so the broken check will be avoided. The test for this issue starts to pass with this patch, so the xfail tag is removed. Fixes #13551 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-04-19 09:31:41 +03:00
Nadav Har'El	5b792dde68	Merge 'Extend aws_sigv4 code to suite S3 client needs' from Pavel Emelyanov The AWS signature-generating code was moved from alternator some time ago as is. Now it's clear that in which places it should be extended to work for S3 client as well. The enhancements are - Support UNSIGNED-PAYLOAD to omit calculating checksums for request body - Include full URL path into the signature, not just hard-coded "/" string - Don't check datastamp expiration if not asked for This is a part of #13493 Closes #13535 * github.com:scylladb/scylladb: utils/aws: Brush up the aws_sigv4.hh header utils/aws: Export timepoint formatter utils/aws: Omit datestamp expiration checks when not needed utils/aws: Add canonical-uri argument utils/aws: Support unsigned-payload signatures	2023-04-18 16:33:52 +03:00
Pavel Emelyanov	9628d07adb	Put storage_service.hh on a diet By removing unneeded headers inclusions. At the cost of few more forward declarations and a couple of extra includes in other .cc files. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13552	2023-04-18 14:53:17 +03:00
Nadav Har'El	78555ba7f1	test/cql-pytest: add tests for data casts and inf in sums This patch adds tests to reproduce issue #13551. The issue, discovered by a dtest (cql_cast_test.py), claimed that either cast() or sum(cast()) from varint type broke. So we add two tests in cql-pytest: 1. A new test file, test_cast_data.py, for testing data casts (a CAST (...) as ... in a SELECT), starting with testing casts from varint to other types. The test uncovers a lot of interesting cases (it is heavily commented to explain these cases) but nothing there is wrong and all tests pass on Scylla. 2. An xfailing test for sum() aggregate of +Inf and -Inf. It turns out that this caused #13551. In Cassandra and older Scylla, the sum returned a NaN. In Scylla today, it generates a misleading error message. As usual, the tests were run on both Cassandra (4.1.1) and Scylla. Refs #13551. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-04-18 13:38:42 +03:00
Anna Stuchlik	3d25edf539	doc: remove the sequential repair option from docs Fixes https://github.com/scylladb/scylladb/issues/12132 The sequential repair mode is not supported. This commit removes the incorrect information from the documentation. Closes #13544	2023-04-18 09:45:48 +03:00
Tomasz Grabiec	a8f8f9f0ea	Merge 'raft topology: store `shard_count` and `ignore_msb` in topology' from Kamil Braun Add new columns to the `system.topology` table: `shard_count` and `ignore_msb`. When a node bootstraps or restarts and observes that the values stored in `topology` are different than the local values, it updates them. This is done in the `update_topology_with_local_metadata` function (the 'metadata' here being the two values). Additional flag persisted in `system.scylla_local` is used to safely avoid performing read barriers when the values didn't change on node restart. A comment in `update_topology_with_local_metadata` explains why this flag is needed. An example use case where `shard_count` and `ignore_msb` are needed is creating CDC generations. Fixes: #13508 Closes #13521 * github.com:scylladb/scylladb: raft topology: update `release_version` in topology on restart raft topology: store `shard_count` and `ignore_msb` in topology	2023-04-18 01:18:50 +02:00
Anna Stuchlik	da7a75fe7e	doc: remove in-memory tables from OSS docs Related: https://github.com/scylladb/scylladb/issues/13119 This commit removes the information about in-memory tables from the Open Source documentation, as it is an Enterprise-only feature. Closes #13496	2023-04-17 16:00:09 +03:00
Botond Dénes	de67978211	Update tools/jmx submodule * tools/jmx 826da61d...e9bfaabd (1): > metrics: revert 'metrics: EstimatedHistogram::getValues() returns bucketOffsets'	2023-04-17 15:42:11 +03:00
Avi Kivity	7724223134	Merge 'utils: big_decimal: optimize big_decimal::compare() and use <=> operator' from Kefu Chai in this series, we use <=> operator to replace `big_decimal::compare()` for better readability. also, we trade the chained ternary expression with a more verbose if-else statement for better performance and readability. Closes #13478 * github.com:scylladb/scylladb: utils: big_decimal: replace compare() with <=> operator utils: big_decimal: optimize big_decimal::compare()	2023-04-17 14:33:53 +03:00
Avi Kivity	7a42927a3d	treewide: stop using 'using namespace std' in namespace scope Such namespace-wide imports can create conflicts between names that are the same in seastar and std, such as {std,seastar}::future and {std,seastar}::format, since we also have 'using namespace seastar'. Replace the namespace imports with explicit qualification, or with specific name imports. Closes #13528	2023-04-17 14:08:37 +03:00
Botond Dénes	38c14a556a	Merge 'A couple of s3/client fixes found when testing over AWS S3' from Pavel Emelyanov This is a part of PR #13493 that contains found fixes for the client code itself. The original PR has some questions to resolve, so it's worth merging the fixes separately. Closes #13534 * github.com:scylladb/scylladb: s3/client: Add comments about multipart upload completion message s3/client: Fix succeeded/failed part upload final checking s3/client: Fix parts to start from 1	2023-04-17 13:33:12 +03:00
Botond Dénes	b8e47569e6	Merge 'doc: extend the information about the recommended RF on the Tracing page' from Anna Stuchlik Fixes https://github.com/scylladb/scylla-doc-issues/issues/823. This PR extends the note on the Tracing page to explain what is meant by setting the RF to ALL and adds a link for reference. Closes #12418 * github.com:scylladb/scylladb: docs: add an explanation to recommendation in the Note box doc: extend the information about the recommended RF on the Tracing page	2023-04-17 13:28:19 +03:00
Anna Stuchlik	2d2d92cf18	docs: add an explanation to recommendation in the Note box	2023-04-17 11:39:06 +02:00
Kamil Braun	a4159cc281	raft topology: update `release_version` in topology on restart Check on node start if local value of `release_version` changed. If it did, update it in `system.topology` like we do with `shard_count` and `ignore_msb`.	2023-04-17 10:52:05 +02:00
Kamil Braun	f9051dccaa	raft topology: store `shard_count` and `ignore_msb` in topology Add new columns to the `system.topology` table: `shard_count` and `ignore_msb`. When a node bootstraps or restarts and observes that the values stored in `topology` are different than the local values, it updates them. This is done in the `update_topology_with_local_metadata` function (the 'metadata' here being the two values). Additional flag persisted in `system.scylla_local` is used to safely avoid performing read barriers when the values didn't change on node restart. A comment in `update_topology_with_local_metadata` explains why this flag is needed. An example use case where `shard_count` and `ignore_msb` are needed is creating CDC generations. Fixes: #13508	2023-04-17 10:45:30 +02:00
Pavel Emelyanov	d09d6adbf4	utils/aws: Brush up the aws_sigv4.hh header Add lost pragma-once directive. Remove the hashers.hh inclusion. It was carried in when the whole code was detached from alternator (`f5de0582c8`), but this header is not needed in the header, only in the .cc file which uses sha256_hasher. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-17 11:16:45 +03:00
Pavel Emelyanov	792490e095	utils/aws: Export timepoint formatter The format of timestamp for AWS requests is defined in documentation, there's already the code that prepares it in this form. This patch exports this method so that S3 client could use it in next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-17 11:14:45 +03:00
Pavel Emelyanov	706b60a0b0	utils/aws: Omit datestamp expiration checks when not needed The signing code is used in two ways -- by alternator to verify the arrived signed request and by S3 client to prepare the signed request. In the former case date expiration check is performed, but for the latter this is not required, because date stamp is most likely now (or close to it). So this patch makes the orig_datestamp argument optional meaning that expiration checks can be omited. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-17 11:14:45 +03:00
Pavel Emelyanov	c5ccef078a	utils/aws: Add canonical-uri argument Current signing code hard-codes the "/" as the URL, likely this just works for alternator. For S3 client the URL would include bucket and object name and should thus become the argument, not constant. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-17 11:14:45 +03:00
Pavel Emelyanov	8eabe9c4ef	utils/aws: Support unsigned-payload signatures For S3 signing the whole request payload can be too resource consuming. Fortunately, payload signing is only enforced if used with plain http, but with real S3 we're going to use signed requests over https only (see next patch why). Said that, the patch turns body-content into optional reference (i.e. -- a pointer) so that the signing code could inject the UNSIGNED-PAYLOAD mark instead of the payload signature and omit heavy payload signing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-17 11:14:45 +03:00
Pavel Emelyanov	7c7a3416c5	s3/client: Add comments about multipart upload completion message The message length is pre-calculated in advance to provide correct content-length request header. This math is not obvious and deserves a comment. Also, the final message preparation code is also implicitly checking if any part failed to upload. There's a comment in the upload_sink's upload_part() method about it, but the finalization place deserves one too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-17 11:08:34 +03:00
Pavel Emelyanov	3f86bed600	s3/client: Fix succeeded/failed part upload final checking When all parts upload complete the final message is prepared and sent out to the server. The preparation code is also responsible for checking if all parts uploaded OK by checking the part etag to be non-empty. In that check a misprint crept in -- the whole list is checked to be empty, not the individual etag itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-17 11:08:15 +03:00
Botond Dénes	6c889213bf	Merge 'Topology add node exception safety' from Benny Halevy Currently if index_node throws when trying to add an already indexed node, pop_node might unindex the existing node instead of the new one. Instead, with this change, unindex_node looks up the node by its pointer and removed it from the index map only if it's found there so to clean up safely after index_node throws (at any stage). Add a unit test to verify that. In addition, added a unit test to reproduce #13502 and test the fix. Closes #13512 * github.com:scylladb/scylladb: test: locator_topology: add test_update_node topology: add_node, unindex_node: make exception safe	2023-04-17 11:02:15 +03:00
Pavel Emelyanov	79379760e6	s3/client: Fix parts to start from 1 Docs say, that part numbers should start from 1, while the code follows the tradition and starts from 0. Minio is conveniently incompatible in this sense so test had been passing so far. On real S3 part number 0 ends up with failed request. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-17 10:43:12 +03:00
Botond Dénes	4c37dc5507	Merge 'keys: specialize fmt::formatter<partition_key> and friends' from Kefu Chai this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print following classes without the help of `operator<<`. - partition_key_view - partition_key - partition_key::with_schema_wrapper - key_with_schema - clustering_key_prefix - clustering_key_prefix::with_schema_wrapper the corresponding `operator<<()` are dropped dropped in this change, as all its callers are now using fmtlib for formatting now. the helper of `print_key()` is removed, as its only caller is `operator<<(std::ostream&, const clustering_key_prefix::with_schema_wrapper&)`. the reason why all these operators are replaced in one go is that we have a template function of `key_to_str()` in `db/large_data_handler.cc`. this template function is actually the caller of operator<< of `partition_key::with_schema_wrapper` and `clustering_key_prefix::with_schema_wrapper`. so, in order to drop either of these two operator<<, we need to remove both of them, so that we can switch over to `fmt::to_string()` in this template function. Refs scylladb#13245 Closes #13513 * github.com:scylladb/scylladb: keys: consolidate the formatter for partition_keys keys: specialize fmt::formatter<partition_key> and friends	2023-04-17 10:27:31 +03:00
Benny Halevy	58129fad92	locator/topology: call seastar::current_backtrace only when log_level is enabled `seastar::current_backtrace()` can be quite heavey. When we pass it to a log message in relatively detailed log_level (debug/trace), we pay the price of `current_backtrace` every time, but we rarely print the message. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-16 14:22:06 +03:00
Benny Halevy	490a0ae89b	schema_tables: call seastar::current_backtrace only when log_level is enabled `seastar::current_backtrace()` can be quite heavey. When we pass it to a log message in relatively detailed log_level (debug/trace), we pay the price of `current_backtrace` every time, but we rarely print the message. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-16 14:22:06 +03:00
Kefu Chai	6bb32efac0	utils: big_decimal: replace compare() with <=> operator now that we are using C++20, it'd be more convenient if we can use the <=> operator for comparing. the compiler creates the 6 other operators for us if the <=> operator is defined. so the code is more compacted. in this change, `big_decimal::compare()` is replaced with `operator<=>`, and its caller is updated accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-15 12:52:30 +08:00
Kefu Chai	e991e6087e	utils: big_decimal: optimize big_decimal::compare() before this change in the worst case, the underlying `number::compare()` gets called twice. as it is used by Boost::multiprecision to implement the comparing operators of `number`. but since we can have the result in one go, there is no need to to perform the comparison multiple times. so, in this change, we just call `number::compare()` explicitly, and use it to implement `compare()`. this should save a call of `number::compare()`. also, the chained ternary expression is replaced using if-else statement for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-15 12:52:30 +08:00
Pavel Emelyanov	c501163f95	Merge 'reader_permit: give better names to active* states' from Botond Dénes The names of these states have been the source of confusion ever since they were introduced. Give them names which better reflects their true meaning and gives less room for misinterpretation. The changes are: * active/unused -> active * active/used -> active/need_cpu * active/blocked -> active/await Hopefully the new names do a better job at conveying what these states really mean: * active - a regular admitted permit, which is active (as opposed to an inactive permit). * active/need_cpu - an active permit which was marked as needing CPU for the read to make progress. This permit prevents admission of new permits while it is in this state. * active/await - a former active/need_cpu permit, which has to wait on I/O or a remote shard. While in this state, it doesn't block the admission of new permits (pending other criteria such as resource availability). Closes #13482 * github.com:scylladb/scylladb: docs/dev/reader-concurrency-semaphore.md: expand on how the semaphore works reader_permit: give better names to active* states	2023-04-14 20:39:05 +03:00
Pavel Emelyanov	4e7f4b9303	Merge 'scripts/open-coredump.sh: allow user to plug in scylla package' from Botond Dénes Lately we have observed that some builds are missing the package_url in the build metadata. This is usually caused by changes in how build metadata is stored on the servers and the s3 reloc server failing to dig them out of the metadata files. A user can usually still obtain the package url but currently there is no way to plug in user-obtained scylla package into the script's workflow. This PR fixes this by allowing the user to provide the package as `$ARTIFACT_DIR/scylla.package` (in unpacked form). Closes #13519 * github.com:scylladb/scylladb: scripts/open-coredump.sh: allow bypassing the package downloading scripts/open-coredump.sh: check presence of mandatory field in build json object scripts/open-coredump.sh: more consistent error messaging	2023-04-14 20:35:06 +03:00
Benny Halevy	e18eb71fa3	test: locator_topology: add test_update_node Reproduces issue fixed in PR #13502 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-14 17:51:07 +03:00
Benny Halevy	e29994b2aa	topology: add_node, unindex_node: make exception safe Current if index_node throws when trying to add an already indexed node, pop_node might unindex the existing node instead of the new one. Instead, with this change, unindex_node looks up the node by its pointer and removed it from the index map only if it's found there so to clean up safely after index_node throws (at any stage). Add a unit test to verify that. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-14 17:51:05 +03:00
Tomasz Grabiec	952b455310	Merge ' tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes scylla-sstable currently has two ways to obtain the schema: * via a `schema.cql` file. * load schema definition from memory (only works for system tables). This meant that for most cases it was necessary to export the schema into a CQL format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable is inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file. This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a schema.cql is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override. If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong. A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes. This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change. Example: ``` $ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db {"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}} ``` As seen above, subdirectories like qurantine, staging etc are also supported. Fixes: https://github.com/scylladb/scylladb/issues/10126 Closes #13448 * github.com:scylladb/scylladb: test/cql-pytest: test_tools.py: add tests for schema loading test/cql-pytest: add no_autocompaction_context docs: scylla-sstable.rst: remove accidentally added copy-pasta docs: scylla-sstable.rst: remove paragraph with schema limitations docs: scylla-sstable.rst: update schema section test/cql-pytest: nodetool.py: add flush_keyspace() tools/scylla-sstable: reform schema loading mechanism tools/schema_loader: add load_schema_from_schema_tables() db/schema_tables: expose types schema	2023-04-14 16:46:26 +02:00
Botond Dénes	edc75f51ff	docs/dev/reader-concurrency-semaphore.md: expand on how the semaphore works Greatly expand on the details of how the semaphore works. Organize the content into thematic chapters to improve navigation. Improve formatting while at it.	2023-04-14 08:51:24 -04:00
Botond Dénes	943ae7fc69	reader_permit: give better names to active* states The names of these states have been the source of confusion ever since they were introduced. Give them names which better reflects their true meaning and gives less room for misinterpretation. The changes are: * active/unused -> active * active/used -> active/need_cpu * active/blocked -> active/await Hopefully the new names do a better job at conveying what these states really mean: * active - a regular admitted permit, which is active (as opposed to an inactive permit). * active/need_cpu - an active permit which was marked as needing CPU for the read to make progress. This permit prevents admission of new permits while it is in this state. * active/await - a former active/need_cpu permit, which has to wait on I/O or a remote shard. While in this state, it doesn't block the admission of new permits (pending other criteria such as resource availability).	2023-04-14 08:40:46 -04:00
Botond Dénes	cae79ef2c3	scripts/open-coredump.sh: allow bypassing the package downloading By allowing the user to plug a manually downloaded package. Consequently the "package_url" field of the build metadata is checked only if there is no user-provided extracted package. This allows working around builds for which the metadata server returns no "package_url", by allowing the user to locate and download the package themselves, providing it to the script by simply extracting it as $ARTIFACT_DIR/scylla.package.	2023-04-14 07:48:21 -04:00
Kamil Braun	200123624f	Merge 'test: reproducers for store mutation with schema change and host down' from Alecco Reproducers for https://github.com/scylladb/scylladb/issues/10770. (Already fixed in `15ebd59071`) Includes necessary improvements and fixes to `pylib`. Closes #12699 * github.com:scylladb/scylladb: test/pytest: reproducers for store mutation... test: pylib: Add a way to create cql connections with particular coordinators test/pylib: get gossiper alive endpoints test/topology: default replication factor 3 test/pylib: configurable replication factor	2023-04-14 13:47:51 +02:00
Botond Dénes	45fbdbe5f7	scripts/open-coredump.sh: check presence of mandatory field in build json object Mandatory fields missing in the build json object lead to obscure, unrelated error messages down the road. Avoid this by checking that all required fields all present and print an error message if any is missing.	2023-04-14 07:33:46 -04:00
Botond Dénes	4df5ec4080	scripts/open-coredump.sh: more consistent error messaging Start all erro messages with "error: ..." and log them to stderr.	2023-04-14 07:24:14 -04:00
Botond Dénes	38d6635afd	Update tools/java submodule * tools/java eddef023...c9be8583 (1): > README.md: drop cqlsh from README.md	2023-04-14 11:53:16 +03:00
Botond Dénes	7586491e1e	Update tools/jmx/ submodule * tools/jmx/ 57c16938...826da61d (4): > install.sh: do not create /usr/scylla/jmx in nonroot mode > install.sh: remove "echo done" > reloc-pkg: rename symlinks/scylla-jmx to select-java > install.sh: select java executable at runtime	2023-04-14 11:47:54 +03:00
Kefu Chai	c580e30ec7	cql3: expr: return more accurate error message for invalidated token() args before this change, we just print out the addresses of the elements in `column_defs`, if the arguments passed to `token()` function are not valid. this is not quite helpful from the user's perspective. as user would be more interested in the values. also, we could print more accurate error message for different error. in this change, following Cassandra 4.1's behavior, three cases are identified, and corresponding errors are returned respectively: * duplicated partition keys * wrong order of partition key * missing keys where, if the partition key order is wrong, instead of printing the keys specified by user, the correct order is printed in the error message for helping user to correct the `token()` function. for better performance, the checks are performed only if the keys do not match, based on the assumption that the error handling path is not likely to be executed. tests are added accordingly. they tested with Canssandra 4.1.1 also. Fixes #13468 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13470	2023-04-14 11:46:18 +03:00
Botond Dénes	4eb1bb460a	Update tools/python3 submodule * tools/python3 d2f57dd9...30b8fc21 (1): > create-relocatable-package.py: fix timestamp of executable files	2023-04-14 11:39:17 +03:00
Raphael S. Carvalho	47b2a0a1f6	data_directory: Describe storage options of a keyspace Description of storage options is important for S3, as one needs to know if underlying storage is either local or remote, and if the latter, details about it. This relies on server-side desc statement. $ ./bin/cqlsh.py -e "describe keyspace1;" CREATE KEYSPACE keyspace1 WITH replication = { ... } AND storage = {'type': 'S3', 'bucket': 'sstables', 'endpoint': '127.0.0.1:9000'} AND durable_writes = true; Fixes #13507. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #13510	2023-04-14 11:34:35 +03:00
Benny Halevy	054667d5b6	storage_service: node_ops_ctl: send_to_all: print correct set of nodes in nodes_down error message nodes_failed are printed by mistake, instead of nodes_down Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #13509	2023-04-14 11:31:20 +03:00
Botond Dénes	289ff821c9	Merge 'Remove global proxy usage from view builder's value_getter' from Pavel Emelyanov There's a legacy safety check in view code that needs to find a base table from its schema ID. To do it it calls for global storage proxy instance. The comment says that this code can be removed once computes_column feature is known by everyone. I'm not sure if that's the case, so here's more complicated yet less incompatible way to stop using global proxy instance. Closes #13504 * github.com:scylladb/scylladb: view: Remove unused view_ptr reference view: Carry backing-secondary-index bit via view builder view: Keep backing-seconday-index bool on value_getter table: Add const index manager sgetter	2023-04-14 11:23:23 +03:00
Kefu Chai	60ff230d54	create-relocatable-package.py: use f-string in `dcce0c96a9`, we should have used f-string for printing the return code of gzip subprocess. but the "f" prefix was missed. so, in this change, it is added. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13500	2023-04-14 08:29:33 +03:00
Raphael S. Carvalho	a47bac931c	Move TWCS option from table into TWCS itself enable_optimized_twcs_queries is specific to TWCS, therefore it belongs to TWCS, not replica::table. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #13489	2023-04-14 08:28:16 +03:00
Anna Stuchlik	989a75b2f7	doc: update the metrics between 5.2 and 2023.1 Related: https://github.com/scylladb/scylla-enterprise/issues/2794 This commit adds the information about the metric changes in version 2023.1 compared to version 5.2. This commit is part of the 5.2-to-2023.1 upgrade guide and must be backported to branch-5.2. Closes #13506	2023-04-14 08:23:53 +03:00
Kefu Chai	85b21ba049	keys: consolidate the formatter for partition_keys since there are two places formatting `with_schema_wrapper`, it'd be desirable if we can consolidate them. so, in this change, the formatting code is extracted into a helper, so we only have a single place for formatting the `with_schema_wrapper`s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-14 13:21:30 +08:00
Kefu Chai	3738fcbe05	keys: specialize fmt::formatter<partition_key> and friends this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print following classes without the help of `operator<<`. - partition_key_view - partition_key - partition_key::with_schema_wrapper - key_with_schema - clustering_key_prefix - clustering_key_prefix::with_schema_wrapper the corresponding `operator<<()` are dropped dropped in this change, as all its callers are now using fmtlib for formatting now. the helper of `print_key()` is removed, as its only caller is `operator<<(std::ostream&, const clustering_key_prefix::with_schema_wrapper&)`. the reason why all these operators are replaced in one go is that we have a template function of `key_to_str()` in `db/large_data_handler.cc`. this template function is actually the caller of operator<< of `partition_key::with_schema_wrapper` and `clustering_key_prefix::with_schema_wrapper`. so, in order to drop either of these two operator<<, we need to remove both of them, so that we can switch over to `fmt::to_string()` in this template function. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-14 13:21:30 +08:00
Botond Dénes	1da02706dd	Merge 'Discard SSTable bloom filter on load-and-stream' from Raphael "Raph" Carvalho Load-and-stream reads the entire content from SSTables, therefore it can afford to discard the bloom filter that might otherwise consume a significant amount of memory. Bloom filters are only needed by compaction and other replica::table operations that might want to check the presence of keys in the SSTable files, like single-partition reads. It's not uncommon to see Data:Filter ratio of less than 100:1, meaning that for ~300G of data, filters will take ~3G. In addition to saving memory footprint, it also reduces operation time as load-and-stream no longer have to read, parse and build the filters from disk into memory. Closes #13486 * github.com:scylladb/scylladb: sstable_loader: Discard SSTable bloom filter on load-and-stream sstables: Allow SSTable loading to discard bloom filter sstables: Allow sstable_directory user to feed custom sstable open config sstables: Move sstable_open_info into open_info.hh	2023-04-14 06:18:54 +03:00
Alejo Sanchez	9597822214	test/pytest: reproducers for store mutation... with schema change and host down Reproducers for a failure during lwt operation due to missing of a column mapping in schema history table. Issue #10770	2023-04-13 21:23:03 +02:00
Tomasz Grabiec	041ee3ffdd	test: pylib: Add a way to create cql connections with particular coordinators Usage: await manager.driver_connect(server=servers[0]) manager.cql.execute(f"...", execution_profile='whitelist')	2023-04-13 21:23:03 +02:00
Alejo Sanchez	62a945ccd5	test/pylib: get gossiper alive endpoints Helper to get list of gossiper alive endpoints from REST API. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-04-13 21:23:03 +02:00
Alejo Sanchez	08d754e13f	test/topology: default replication factor 3 For most tests there will be nodes down, increase replication factor to 3 to avoid having problems for partitions belonging to down nodes. Use replication factor 1 for raft upgrade tests.	2023-04-13 21:23:02 +02:00
Alejo Sanchez	3508a4e41e	test/pylib: configurable replication factor Make replication factor configurable for the RandomTables helper. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-04-13 21:23:02 +02:00
Benny Halevy	b71f229fc2	topology: node: update_node: do not override internal changed flag by state option Currently, opt_st overrides the internal `changed` flag by setting it with the opt_st changed status. Instead, it should use `\|=` to keep it true if it is already so. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #13502	2023-04-13 17:46:59 +02:00
Raphael S. Carvalho	fe6df3d270	sstable_loader: Discard SSTable bloom filter on load-and-stream Load-and-stream reads the entire content from SSTables, therefore it can afford to discard the bloom filter that might otherwise consume a significant amount of memory. Bloom filters are only needed by compaction and other replica::table operations that might want to check the presence of keys in the SSTable files, like single-partition reads. It's not uncommon to see Data:Filter ratio of less than 100:1, meaning that for ~300G of data, filters will take ~3G. In addition to saving memory footprint, it also reduces operation time as load-and-stream no longer have to read, parse and build the filters from disk into memory. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-04-13 11:34:22 -03:00
Raphael S. Carvalho	17261369ea	sstables: Allow SSTable loading to discard bloom filter If bloom filter is not loaded, it means that an always-present filter is used, which translates into the SSTable being opened on every single read. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-04-13 11:34:22 -03:00
Raphael S. Carvalho	1427a5ce98	sstables: Allow sstable_directory user to feed custom sstable open config This will be used by load-and-stream to load SSTables in its own customized way. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-04-13 11:34:16 -03:00
Raphael S. Carvalho	86516f4cef	sstables: Move sstable_open_info into open_info.hh So sstable_directory can access its definition without having to include sstables.hh. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-04-13 11:31:14 -03:00
Pavel Emelyanov	097cea11b2	view: Remove unused view_ptr reference After previous patch the value_getter::_view becomes unused and can be dropped. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-13 16:51:27 +03:00
Pavel Emelyanov	821c8b19a6	view: Carry backing-secondary-index bit via view builder When view builder constructs it populates itself with view updates. Later the updates may instantiate the value_getter-s which, in turn, would need to check if the view is backing secondary index. Good news is that when view builder constructs it has all the information at hand needed to evaluate this "backing" bit. It's then propagated down to value_getter via corresponding view_updates. The getter's _view field becomes unused after this change and is (void)-ed to make this patch compile. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-13 16:48:36 +03:00
Pavel Emelyanov	e8b5022343	view: Keep backing-seconday-index bool on value_getter The getter needs to check if the view is backing a secondary index. Currentl it's done inside the handle_computed_column() method, but it's more convenient if this bit is known during construction, so move it there. There are no places that can change this property between view_getter is created and the method in question is called. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-13 16:45:59 +03:00
Pavel Emelyanov	0d9da46428	table: Add const index manager sgetter To be used by next patch that will call this helper inside non-mutable lambda Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-13 16:45:16 +03:00
Botond Dénes	bd57471e54	reader_concurrency_semaphore: don't evict inactive readers needlessly Inactive readers should only be evicted to free up resources for waiting readers. Evicting them when waiters are not admitted for any other reason than resources is wasteful and leads to extra load later on when these evicted readers have to be recreated end requeued. This patch changes the logic on both the registering path and the admission path to not evict inactive readers unless there are readers actually waiting on resources. A unit-test is also added, reproducing the overly-agressive eviction and checking that it doesn't happen anymore. Fixes: #11803 Closes #13286	2023-04-13 15:20:18 +03:00
Pavel Emelyanov	b1501d4261	s3/client: Don't use designated initialization of sys stat struct It makes compiler complan about mis-ordered initialization of st_nlink vs st_mode on different arches. Current code (st_nlink before st_mode) compiled fine on x86, but fails on ARM which wants st_mode to come before st_nlink. Changing the order would, apparently, break x86 build with similar message. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13499	2023-04-13 15:13:56 +03:00
Kefu Chai	87170bf07a	build: cmake: add more tests this change should add the remaining tests under boost/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13494	2023-04-13 14:57:00 +03:00
Botond Dénes	e103ef3bcb	Update seastar submodule * seastar 1204efbc...ed7a0f54 (46): > gate: s/intgernal/internal/ > reactor: set reactor::_stopping to true on all shards > condition-variable: replace the coroutine wakeup task with a promise > tutorial: explain the buffer_size_t param of generator coroutine > log: call log_level_map explicitly in constructor > future: de-variadicate make_ready_future() and similar helpers > timer-set,scollectd: remove unnecessary ";" > util/conversion: remove inclusion guards > foreign_ptr: destroy: use run_in_background > abort_source, abortable_fifo: use is_nothrow_invocable_r_v<> > alien: add type constraint for alien::run_on and alien::submit_to > alien: add noexcept specifier for lambda passed to run_on() > test: alien_test: test alien::run_on() also > test: alien_test: throw if unexpected things happens > future: make API level 6 mandatory > api-level: update IDE fallback > core/on_internal_error: always log error with backtrace > future: make API level 5 mandatory > websocket: fix frame parsing. > websocket: fix frame assembling. > when_all: drop code for API_LEVEL < 4 > future: drop internal call_then_impl > future: when_all_succeed(): make API level 4 mandatory > reactor: trade comment for type constraints > sstring: s/is_invocable_r/is_invocable_r_v/ > doc: compatibility: document API levels 5 and 6 > demos: file_demo: pass a string_view to_open_file_dma() > TLS: Add issuer/subject info to verification error message > test: fstream_test: drop unnecessary API_LEVEL check > manual_clock: advance: use run_in_background to expire_timers > reactor: add run_in_backround and close > websocket: shutdown input first. > websocket: use gate to guard background tasks. > websocket: remove trailing spaces. > websocket_demo: ignore sleep_aborted exception. > websocket_demo: fix coredump. > fstream: drop API level 2 (make_file_output_stream() returning non-future) > core/sstring: do not use ostream_formatter > metrics: use fmt::to_string() when creating a label > backtrace: fix size calculation in dl_iterate_phdr > Downgrade expected stall detector warning to info > fix: Add missing inline code blocks > spawn_test: fix /bin/cat stuck in reading input. > reactor: pass fd opened in blocking mode to spawned process > reactor: skip sigaction if handler has been registered before. > reactor: allow registering handler multiple times for a signal.	2023-04-13 14:28:30 +03:00
Kefu Chai	29ca0009a2	dist/debian: do not Depend on ${shlibs:Depends} the substvar of `${shlibs:Depends}` is set by dh_shlibdeps, which inspects the ELF images being packaged to figure out the shared library dependencies for packages. but since `f3c3b9183c`, we just override the `override_dh_shlibdeps` target in debian/rules with no-op. as we take care of the shared library dependencies by vendoring the runtime dependencies by ourselves using the relocatable package. so this variable is never set. that's why `dpkg-gencontrol` complains when processing `debian/control` and trying to materialize the substvars. in this change, the occurances of `${shlibs:Depends}` are removed to silence the warnings from `dpkg-gencontrol`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13457	2023-04-13 08:34:05 +03:00
Raphael S. Carvalho	9760149e8d	compaction: Don't bump compaction shares during major execution Commit `49892a0`, back in 2018, bumps the compaction shares by 200 to guarantee a minimum base line. However, after commit `e3f561d`, major compaction runs in maintenance group meaning that bumping shares became completely irrelevant and only causes regular compaction to be unnecessarily more aggressive. Fixes #13487. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #13488	2023-04-13 08:20:25 +03:00
Botond Dénes	50ee4033a9	Update tools/jmx submodule * tools/jmx 602329c9...57c16938 (1): > install.sh: replace tab with spaces	2023-04-12 13:28:23 +03:00
Botond Dénes	5d0c0ae0c4	Merge 'token_metadata: use topology nodes for endpoint_to_host_id map' from Benny Halevy Currently, token_metadata_impl maintains a "shadow" endpoint to host_id map on top of the maps in topology. This series first reimplements the functions that currently use this map to use topology instead. Then the important users of `get_endpoint_to_host_id_map_for_reading`: node_ops_ctl and view_builder and converted to use a new `topology::for_each_node` function to process all nodes in topology directly, without going through `get_endpoint_to_host_id_map_for_reading`. Closes #13476 * github.com:scylladb/scylladb: view_builder: view_build_statuses: use topology::for_each_node storage_service: node_ops_ctl: refresh_sync_nodes: use topology::for_each_node topology: add for_each_node token_metadata: get endpoint to node map from topology	2023-04-12 10:33:02 +03:00
Botond Dénes	1440efa042	test/cql-pytest: test_tools.py: add tests for schema loading A set of comprehensive tests covering all the supported ways of providing the schema to scylla-sstable, either explicitely or implicitely (auto-detect).	2023-04-12 03:14:43 -04:00
Botond Dénes	76a7d3448f	test/cql-pytest: add no_autocompaction_context	2023-04-12 03:14:43 -04:00
Botond Dénes	b7a4304b69	docs: scylla-sstable.rst: remove accidentally added copy-pasta	2023-04-12 03:14:43 -04:00
Botond Dénes	1673f10f7a	docs: scylla-sstable.rst: remove paragraph with schema limitations The above file contained a paragraph explaining the limitations of `scylla-sstable.rst` w.r.t. automatically finding the schema. This no longer applies so remove it.	2023-04-12 03:14:43 -04:00
Botond Dénes	9f9beef8fd	docs: scylla-sstable.rst: update schema section With the recent changes to the ways schema can be provided to the tool.	2023-04-12 03:14:43 -04:00
Botond Dénes	222f624757	test/cql-pytest: nodetool.py: add flush_keyspace() It would have been better if `flush()` could have been called with a keyspace and optional table param, but changing it now is too much churn, so we add a dedicated method to flush a keyspace instead.	2023-04-12 03:14:43 -04:00
Botond Dénes	ffec1e5415	tools/scylla-sstable: reform schema loading mechanism So far, schema had to be provided via a schema.cql file, a file which contains the CQL definition of the table. This is flexible but annoying at the same time. Many times sstables the tool operates on are located in their table directory in a scylla data directory, where the schema tables are also available. To mitigate this, an alternative method to load the schema from memory was added which works for system tables. In this commit we extend this to work for all kind of tables: by auto-detecting where the scylla data directory is, and loading the schema tables from disk.	2023-04-12 03:14:43 -04:00
Botond Dénes	fd4c2f2077	tools/schema_loader: add load_schema_from_schema_tables() Allows loading the schema for the designated keyspace and table, from the system table sstables located on disk. The sstable files opened for read only.	2023-04-12 03:14:43 -04:00
Botond Dénes	63b266a988	db/schema_tables: expose types schema	2023-04-12 02:43:53 -04:00
Botond Dénes	0c51f72ad6	Merge 'utils, mutation: replace operator<<(..) with fmt formatter' from Kefu Chai this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `tombstone` and `shadowable_tombstone` without the help of fmt::ostream. and their `operator<<(ostream,..)` are dropped, as there are no users of them anymore. Refs #13245 Closes #13474 * github.com:scylladb/scylladb: mutation: specialize fmt::formatter<tombstone> and fmt::formatter<shadowable_tombstone> utils: specialize fmt::formatter<optional<>>	2023-04-12 09:32:56 +03:00
Kefu Chai	ff202723c6	utils: big_decimal: specialize fmt::formatter<big_decimal> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `big_decimal` without the help of `operator<<`. this operator is droppe in this change, as all its callers are now using fmtlib for formatting now. we might need to use fmtlib to implement `big_decimal::to_string()`, and use `fmt::to_string()` instead, but let's leave it for a follow-up change. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13479	2023-04-12 09:20:50 +03:00
Botond Dénes	f82287a9af	Update tools/jmx/ submodule * tools/jmx/ b7ae52bc...602329c9 (1): > metrics: EstimatedHistogram::getValues() returns bucketOffsets	2023-04-12 09:17:57 +03:00
Botond Dénes	525b21042f	Merge 'Rewrite sstables keyspace compaction task' from Aleksandra Martyniuk Task manager task implementations of classes that cover rewrite sstables keyspace compaction which can be start through /storage_service/keyspace_compaction/ api. Top level task covers the whole compaction and creates child tasks on each shard. Closes #12714 * github.com:scylladb/scylladb: test: extend test_compaction_task.py to test rewrite sstables compaction compaction: create task manager's task for rewrite sstables keyspace compaction on one shard compaction: create task manager's task for rewrite sstables keyspace compaction compaction: create rewrite_sstables_compaction_task_impl	2023-04-12 08:38:59 +03:00
Aleksandra Martyniuk	25cfffc3ae	compaction: rename local_offstrategy_keyspace_compaction_task_impl to shard_offstrategy_keyspace_compaction_task_impl Closes #13475	2023-04-12 08:38:25 +03:00
Kefu Chai	1cb95b8cff	mutation: specialize fmt::formatter<tombstone> and fmt::formatter<shadowable_tombstone> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `tombstone` and `shadowable_tombstone` without the help of `operator<<`. in this change, only `operator<<(ostream&, const shadowable_tombstone&)` is dropped, and all its callers are now using fmtlib for formatting the instances of `shadowable_tombstone` now. `operator<<(ostream&, const tombstone&)` is preserved. as it is still used by Boost::test for printing the operands in case the comparing tests fail. please note, before this change we were using a concrete string for indent. after this change, some of the places are changed to using fmtlib for indent. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-12 10:57:03 +08:00
Kefu Chai	c980bd54ad	utils: specialize fmt::formatter<optional<>> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `optional<T>` without the help of `operator<<()`. this change also enables us to ditch more `operator<<()`s in future. as we are relying on `operator<<(ostream&, const optional<T>&)` for printing instances of `optional<T>`, and `operator<<(ostream&, const optional<T>&)` in turn uses the `operator<<(ostream&, const T&)`. so, the new specialization of `fmt::formatter<optional<>>` will remove yet another caller of these operators. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-12 10:57:03 +08:00
Benny Halevy	535b71eba3	view_builder: view_build_statuses: use topology::for_each_node Instead of tmptr->get_endpoint_to_host_id_map_for_reading. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-11 18:14:51 +03:00
Benny Halevy	d89fb02d24	storage_service: node_ops_ctl: refresh_sync_nodes: use topology::for_each_node Instead of tmptr->get_endpoint_to_host_id_map_for_reading. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-11 18:14:47 +03:00
Kefu Chai	59579d5876	utils: fragment_range: specialize fmt::formatter<FragmentedView> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print classes fulfill the requirement of `FragmentedView` concept without the help of template function of `to_hex()`, this function is dropped in this change, as all its callers are now using fmtlib for formatting now. the helper of `fragment_to_hex()` is dropped as well, its only caller is `to_hex()`. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13471	2023-04-11 16:09:38 +03:00
Benny Halevy	7b76369ffc	topology: add for_each_node To eventually replace token_metadata::get_endpoint_to_host_id_map_for_reading Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-11 15:55:39 +03:00
Benny Halevy	e635aa30d6	token_metadata: get endpoint to node map from topology Don't maintain a "shadow" endpoint_to_host_id_map in token_metadata_impl. Instead, get the nodes_by_endpoint map from topology and use it to build the endpoint_to_host_id_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-11 15:48:30 +03:00
Botond Dénes	f1bbf705f9	Merge 'Cleanup sstables in resharding and other compaction types' from Benny Halevy This series extends sstable cleanup to resharding and other (offstrategy, major, and regular) compaction types so to: * cleanup uploaded sstables (#11933) * cleanup staging sstables after they are moved back to the main directory and become eligible for compaction (#9559) When perform_cleanup is called, all sstables are scanned, and those that require cleanup are marked as such, and are added for tracking to table_state::cleanup_sstable_set. They are removed from that set once released by compaction. Along with that sstables set, we keep the owned_ranges_ptr used by cleanup in the table_state to allow other compaction types (offstrategy, major, or regular) to cleanup those sstables that are marked as require_cleanup and that were skipped by cleanup compaction for either being in the maintenance set (requiring offstrategy compaction) or in staging. Resharding is using a more straightforward mechanism of passing the owned token ranges when resharding uploaded sstables and using it to detect sstable that require cleanup, now done as piggybacked on resharding compaction. Closes #12422 * github.com:scylladb/scylladb: table: discard_sstables: update_sstable_cleanup_state when deleting sstables compaction_manager: compact_sstables: retrieve owned ranges if required sstables: add a printer for shared_sstable compaction_manager: keep owned_ranges_ptr in compaction_state compaction_manager: perform_cleanup: keep sstables in compaction_state::sstables_requiring_cleanup compaction: refactor compaction_state out of compaction_manager compaction: refactor compaction_fwd.hh out of compaction_descriptor.hh compaction_manager: compacting_sstable_registration: keep a ref to the compaction_state compaction_manager: refactor get_candidates compaction_manager: get_candidates: mark as const table, compaction_manager: add requires_cleanup sstable_set: add for_each_sstable_until distributed_loader: reshard: update sstable cleanup state table, compaction_manager: add update_sstable_cleanup_state compaction_manager: needs_cleanup: delete unused schema param compaction_manager: perform_cleanup: disallow empty sorted_owened_ranges distributed_loader: reshard: consider sstables for cleanup distributed_loader: process_upload_dir: pass owned_ranges_ptr to reshard distributed_loader: reshard: add optional owned_ranges_ptr param distributed_loader: reshard: get a ref to table_state distributed_loader: reshard: capture creator by ref distributed_loader: reshard: reserve num_jobs buckets compaction: move owned ranges filtering to base class compaction: move owned_ranges into descriptor	2023-04-11 14:52:29 +03:00
Botond Dénes	38c98b370f	Update tools/jmx/ submodule * tools/jmx/ 48e16998...b7ae52bc (1): > install.sh: do not fail if jre-11 is not installed	2023-04-11 14:51:31 +03:00
Kefu Chai	dcce0c96a9	create-relocatable-package.py: error out if pigz fails before this change, we don't error out even if pigz fails. but there is chance that pigz fails to create the gzip'ed relocatable tarball either due to environmental issues or some other problems, and we are not aware of this until packaging scripts like `reloc/build_rpm.sh` tries to ungzip this corrupted gzip file. in this change, if pigz's status code is not 0, the status code is printed, and create-relocatable-package.py will return 1. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13459	2023-04-11 14:29:25 +03:00
Aleksandra Martyniuk	e170fa1c99	test: extend test_compaction_task.py to test rewrite sstables compaction	2023-04-11 13:07:22 +02:00
Aleksandra Martyniuk	a93f044efa	compaction: create task manager's task for rewrite sstables keyspace compaction on one shard Implementation of task_manager's task that covers rewrite sstables keyspace compaction on one shard.	2023-04-11 13:07:17 +02:00
Botond Dénes	a8e59d9fb2	Merge 'Metrics relabel from file' from Amnon Heiman This series adds an option to read the relabel config from file. Most of Scylla's metrics are reported per-shard, some times they are also reported per scheduling groups or per tables. With modern hardware, this can quickly grow to a large number of metrics that overload Scylla and the collecting server. One of the main issues around metrics reduction is that many of the metrics are only helpful in certain situations. For example, Scylla monitoring only looks at a subset of the metrics. So in large deployments it would be helpful to scrap only those. An option to do that, would be to mark all dashboards related metrics with a label value, and then Prometheus will request only metrics with that label value. There are two main limitations to scrap by label values: 1. some of the metrics we want to report are in seastar, so we'll need to label them somehow (we cannot just add random labels to seastar metrics) 2. things change, new metrics are introduce and we may want them, it's not practicall to re-compile and wait for a new release whenever we want to change a label just for monitoring. It will be best to have the option to add metrics freely and choose at runtime what to report. This series make use of Seastar API to perform metrics manipulation dynamically. It includes adding, removing, and changing labels and also enable and disable metrics, and enable and disable the skip_when_empty option. After this series the configuration could be used with: ```--relabel-config-file conf.yaml``` The general logic and format follows Prometheus metrics_relabel_config configuration. Where the configuration file looks like: ``` $ cat conf.yaml relabel_configs: - source_labels: [shard] action: drop target_label: shard regex: (2) - source_labels: [shard] action: replace target_label: level replacement: $1 regex: (.3) ``` Closes #12687 github.com:scylladb/scylladb: main: Load metrics relabel config from a file if it exists Add relabel from file support.	2023-04-11 12:47:09 +03:00
Aleksandra Martyniuk	c4098df4ec	compaction: create task manager's task for rewrite sstables keyspace compaction Implementation of task_manager's task covering rewrite sstables keyspace compaction that can be started through storage_service api.	2023-04-11 11:04:21 +02:00
Aleksandra Martyniuk	814254adfd	compaction: create rewrite_sstables_compaction_task_impl rewrite_sstables_compaction_task_impl serves as a base class of all concrete rewrite sstables compaction task classes.	2023-04-11 11:03:09 +02:00
Botond Dénes	dba1d36aa6	Merge 'alternator: fix isolation of concurrent modifications to tags' from Nadav Har'El Alternator's implementation of TagResource, UntagResource and UpdateTimeToLive (the latter uses tags to store the TTL configuration) was unsafe for concurrent modifications - some of these modifications may be lost. This short series fixes the bug, and also adds (in the last patch) a test that reproduces the bug and verifies that it's fixed. The cause of the incorrect isolation was that we separately read the old tags and wrote the modified tags. In this series we introduce a new function, `modify_tags()` which can do both under one lock, so concurrent tag operations are serialized and therefore isolated as expected. Fixes #6389. Closes #13150 * github.com:scylladb/scylladb: test/alternator: test concurrent TagResource / UntagResource db/tags: drop unsafe update_tags() utility function alternator: isolate concurrent modification to tags db/tags: add safe modify_tags() utility functions migration_manager: expose access to storage_proxy	2023-04-11 11:17:23 +03:00
Anna Stuchlik	2921059ebb	doc: add a disclaimer about unsupported upgrade Fixes https://github.com/scylladb/scylla-enterprise/issues/2805 This commit adds the disclaimer that an upgrade by replacing the cluster nodes with nodes with a different release is not supported. Closes #13445	2023-04-11 10:47:39 +03:00
Kefu Chai	86b66a9875	build: cmake: drop test_table.CC this change mirrors the corresponding change in `configure.py` in `4b5b6a9010` . Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13461	2023-04-11 09:42:58 +03:00
Nadav Har'El	79114c5030	cql-pytest: translate Cassandra's tests for DELETE operations This is a translation of Cassandra's CQL unit test source file validation/operations/DeleteTest.java into our cql-pytest framework. There are 51 tests, and they did not reproduce any previously-unknown bug, but did provide additional reproducers for three known issues: Refs #4244 Add support for mixing token, multi- and single-column restrictions Refs #12474 DELETE prints misleading error message suggesting ALLOW FILTERING would work Refs #13250 one-element multi-column restriction should be handled like a single-column restriction Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13436	2023-04-11 09:10:11 +03:00
Botond Dénes	355583066e	Merge 'Reduce memory footprint of SSTable index summary' from Raphael "Raph" Carvalho SSTable summary is one of the components fully loaded into memory that may have a significant footprint. This series reduces the summary footprint by reducing the amount of token information that we need to keep in memory for each summary entry. Of course, the benefit of this size optimization is proportional to the amount of summary entries, which in turn is proportional to the number of partitions in a SSTable. Therefore we can say that this optimization will benefit the most tables which have tons of small-sized partitions, which will result in big summaries. Results: ``` BEFORE [1000000 pkeys] data size: 4035888890, summary -> memory footprint: 5843232, entries: 88158 [10000000 pkeys] data size: 40368888890, summary -> memory footprint: 55787128, entries: 844925 AFTER [1000000 pkeys] data size: 4035888890, summary -> memory footprint: 4351536, entries: 88158 [10000000 pkeys] data size: 40368888890, summary -> memory footprint: 42211984, entries: 844925 ``` That shows a 25% reduction in footprint, for both 1 and 10 million pkeys. Closes #13447 * github.com:scylladb/scylladb: sstables: Store raw token into summary entries sstables: Don't store token data into summary's memory pool	2023-04-11 08:29:11 +03:00
Botond Dénes	05b381bfa2	Merge 'Simple S3 storage for sstables' from Pavel Emelyanov The PR adds sstables storage backend that keeps all component files as S3 objects and system.sstables_registry ownership table that keeps track of what sstables objects belong to local node and their names. When a keyspace is configured with 'STORAGE = { 'type': 'S3' }' the respective class table object eventually gets the storage_options instance pointing to the target S3 endpoint and bucket. All the sstables created for that table attach the S3 storage implementation that maintains components' files as S3 objects. Writing to and reading from components is handled by the S3 client facilities from utils/. Changing the sstable state, which is -- moving between normal, staging and quarantine states -- is not yet implemented, but would eventually happen by updating entries in the sstables registry. To keep track of which node owns which objects, to provide bucket-wide uniqueness of object names and to maintain sstable state the storage driver keeps records in the system.sstables_registry ownership table. The table maps sstable location and generation to the object format, version, status-state () and (!) unique identifier (some time soon this identifier is supposed to be replaced with UUID sstables generations). The component object name is thus s3://bucket/uuid/component_basename. The registry is also used on boot. The distributed loader picks up sstables from all the tables found in schema and for S3-backed keyspaces it lists entries in the registry to a) identify those and b) get their unique S3-side identifiers to open by name. () About sstable's status and state. The state field is the part of today's sstable path on disk -- staging, quarantine, normal (root table data dir), etc. Since S3 doesn't have the renaming facility, moving sstable between those states is only possible by updating the entry in the registry. This is not yet implemented in this set (#13017) The status field tracks sstable' transition through its creation-deletion. It first starts with 'creating' status which corresponds to the today's TemporaryTOC file. After being created and written to the sstable moves into 'sealed' state which corresponds to the today's normal sstable being with the TOC file. To delete sstable atomically it first moves into 'removing' state which is equivalent to being in the deletion-log for the on-disk sstable. Once removed from the bucket, the entry is removed from the registry. To play with: 1. Start minio (installed by install-dependencies.sh) ``` export MINIO_ROOT_USER=${root_user} export MINIO_ROOT_PASSWORD=${root_pass} mkdir -p ${root_directory} minio server ${root_directory} ``` 2. Configure minio CLI, create anonymous bucket ``` mc config host rm local mc config host add local http://127.0.0.1:9000 ${root_user} ${root_pass} mc mb local/sstables mc anonymous set public local/sstables ``` 3. Start Scylla with object-storage feature enabled ``` scylla ... --experimental-features=keyspace-storage-options --workdir ${as_usual}``` 4. Create KS with S3 storage ``` create keyspace ... storage = { 'type': 'S3', 'endpoint': '127.0.0.1:9000', 'bucket': 'sstables' };``` The S3 client has a logger named "s3", it's useful to use on with `trace` verbosity. Closes #12523 * github.com:scylladb/scylladb: test: Add object-storage test distributed_loader: Print storage type when populating sstable_directory: Add ownership table components lister sstable_directory: Make components_lister and API sstable_directory: Create components lister based on storage options sstables: Add S3 storage implementation system_keyspace: Add ownership table system_keyspace: Plug to user sstables manager too sstable: Make storage instance based on storage options sstable_directory: Keep storage_options aboard sstable: Virtualize the helper that gets on-disk stats for sstable sstable, storage: Virtualize data sink making for small components sstable, storage: Virtualize data sink making for Data and Index sstable/writer: Shuffle writer::init_file_writers() sstable: Make storage an API utils: Add S3 readable file impl for random reads utils: Add S3 data sink for multipart upload utils: Add S3 client with basic ops cql-pytest: Add option to run scylla over stable directory test.py: Equip it with minio server sstables: Detach write_toc() helper	2023-04-11 08:17:25 +03:00
Benny Halevy	96660b2ef7	table: discard_sstables: update_sstable_cleanup_state when deleting sstables We need to remove the deleted sstables from update_sstable_cleanup_state otherwise their data and index files will remain opened and their storage space won't be reclaimed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:37:56 +03:00
Benny Halevy	4db961ecac	compaction_manager: compact_sstables: retrieve owned ranges if required If any of the sstables to-be-compacted requires cleanup, retrive the owned_ranges_ptr from the table_state. With that, staging sstables will eventually be cleaned up via regular compaction. Refs #9559 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:36:10 +03:00
Benny Halevy	9105f9800c	sstables: add a printer for shared_sstable Refactor the printing logic in compaction::formatted_sstables_list out to sstables::to_string(const shared_sstable&, bool include_origin) and operator<<(const shared_sstable) on top of it. So that we can easily print std::vector<shared_sstable> from compaction_manager in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:31:35 +03:00
Benny Halevy	d87925d9fc	compaction_manager: keep owned_ranges_ptr in compaction_state When perform_cleanup adds sstables to sstables_requiring_cleanup, also save the owned_ranges_ptr in the compaction_state so it could be used by other compaction types like regular, reshape, or major compaction. When the exhausted sstables are released, check if sstables_requiring_cleanup is empty, and if it is, clear also the owned_ranges_ptr. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:30:53 +03:00
Benny Halevy	c2bf0e0b72	compaction_manager: perform_cleanup: keep sstables in compaction_state::sstables_requiring_cleanup As a first step towards parallel cleanup by (regular) compaction and cleanup compaction, filter all sstables in perform_cleanup and keep the set of sstables in the compaction_state. Erase from that set when the sstables are unregistered from compaction. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:30:39 +03:00
Benny Halevy	b3192b9f16	compaction: refactor compaction_state out of compaction_manager To use it both from compaction_manager and compaction_descriptor in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:28:16 +03:00
Benny Halevy	73280c0a15	compaction: refactor compaction_fwd.hh out of compaction_descriptor.hh So it can be used in the next patch that will refactor compaction_state out of class compaction_manager. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:19:04 +03:00
Benny Halevy	690697961c	compaction_manager: compacting_sstable_registration: keep a ref to the compaction_state To be used for managing sstables requiring cleanup. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:18:02 +03:00
Benny Halevy	cac60a09ac	compaction_manager: refactor get_candidates Allow getting candidates for compaction from an arbitrary range of sstable, not only the in_strategy_sstables. To be used by perform_cleanup to mark all sstables that require cleanup, even if they can't be compacted at this time. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:16:57 +03:00
Benny Halevy	bbfe839a73	compaction_manager: get_candidates: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:16:12 +03:00
Benny Halevy	6ebafe74b9	table, compaction_manager: add requires_cleanup Returns true iff any of the sstables in the set requries cleanup. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:14:36 +03:00
Benny Halevy	d765686491	sstable_set: add for_each_sstable_until Calls a function on all sstables or until the function returns stop_iteration::yes. Change the sstable_set_impl interface to expose only for_each_sstable_until and let sstable_set::for_each_sstable use that, wrapping the void-returning function passed to it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:11:58 +03:00
Benny Halevy	db7fa9f3be	distributed_loader: reshard: update sstable cleanup state Since the sstables are loaded from foreign open info we should mark them for cleanup if needed (and owned_ranges_ptr is provided). This will allow a later patch to enable filtering for cleanup only for sstable sets containing sstables that require cleanup. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:11:00 +03:00
Benny Halevy	d0690b64c1	table, compaction_manager: add update_sstable_cleanup_state update_sstable_cleanup_state calls needs_cleanup and inserts (or erases) the sstable into the respective compaction_state.sstables_requiring_cleanup set. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:10:55 +03:00
Benny Halevy	1baca96de1	compaction_manager: needs_cleanup: delete unused schema param It isn't needed. The sstable already has a schema. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:03:53 +03:00
Benny Halevy	ac9f8486ba	compaction_manager: perform_cleanup: disallow empty sorted_owened_ranges I'm not sure why this was originally supported, maybe for upgrade sstables where we may want to rewrite the sstables without filtering any tokens, but perform_sstable_upgrade is now following a different code path and uses `rewrite_sstables` directly, without pigybacking on cleanup. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:03:03 +03:00
Benny Halevy	ecbd112979	distributed_loader: reshard: consider sstables for cleanup When called from `process_upload_dir` we pass a list of owned tokens to `reshard`. When they are available, run resharding, with implicit cleanup, also on unshared sstables that need cleanup. Fixes #11933 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:01:38 +03:00
Benny Halevy	3ccbb28f2a	distributed_loader: process_upload_dir: pass owned_ranges_ptr to reshard To facilitate implicit cleanup of sstables via resharding. Refs #11933 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 22:59:38 +03:00
Benny Halevy	aa4b18f8fb	distributed_loader: reshard: add optional owned_ranges_ptr param For passing owned_ranges_ptr from distributed_loader::process_upload_dir. Refs #11933 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 22:57:41 +03:00
Benny Halevy	f540af930b	distributed_loader: reshard: get a ref to table_state We don't reference the table itself, only as_table_state. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 22:57:11 +03:00
Benny Halevy	c6b7fcc26f	distributed_loader: reshard: capture creator by ref Now that reshard is a coroutine, creator is preserved in the coroutine frame until completion so we can simply capture it by reference now. Note that previously it was moved into the compaction descriptor, but the capture wasn't mutable so it was copied anyhow and this change doesn't introduced a regression. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 22:56:07 +03:00
Benny Halevy	7c9d16ff96	distributed_loader: reshard: reserve num_jobs buckets We know in advance how many buckets we need. We still need to emplace the first bucket upfront. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 22:55:35 +03:00
Benny Halevy	0c6ce5af74	compaction: move owned ranges filtering to base class Move the token filtering logic down from cleanup_compaction to regular_compaction and class compaction so it can be reused by other compaction types. Create a _owned_ranges_checker in class compaction when _owned_ranges is engaged, and use it in compaction::setup to filter partitions based on the owned ranges. Ref scylladb/scylladb#12998 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 22:55:09 +03:00
Benny Halevy	09df04c919	compaction: move owned_ranges into descriptor Move the owned_ranges_ptr, currently used only by cleanup and upgrade compactions, to the generic compaction descriptor so we apply cleanup in other compaction types. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 22:52:12 +03:00
Pavel Emelyanov	fd817e199c	Merge 'auth: replace operator<<(..) with fmt formatter' from Kefu Chai this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `auth::auth_authentication_options` and `auth::resource_kind` without the help of fmt::ostream. and their `operator<<(ostream,..)` are dropped, as there are no users of them anymore. Refs #13245 Closes #13460 * github.com:scylladb/scylladb: auth: remove unused operator<<(.., resource_kind) auth: specialize fmt::formatter<resource_kind> auth: remove unused operator<<(.., authentication_option) auth: specialize fmt::formatter<authentication_option>	2023-04-10 17:05:09 +03:00
Pavel Emelyanov	21ef5bcc22	test: Add object-storage test The test does - starts scylla (over stable directory - creates S3-backed keyspace (minio is up and running by test.py already) - creates table in that keyspace and populates it with several rows - flushes the keyspace to make sstables hit the storage - checks that the ownership table is populated properly - restarts scylla - makes sure old entries exist Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:44:29 +03:00
Pavel Emelyanov	8b9e9671de	distributed_loader: Print storage type when populating On boot it's very useful to know which storage a table comes from, so add the respective info to existing log messages. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:44:29 +03:00
Pavel Emelyanov	f04c6cdf9a	sstable_directory: Add ownership table components lister When sstables are stored on object storage, they are "registered" in the system.sstables_registry ownership table. The sstable_directory is supposed to list sstables from this table, so here's the respective components lister. The lister is created by sstables_manager, by the time it's requested from the the system keyspace is already plugged. The lister only handles "sealed" sstables. Dangling ones are still ignored, this is to be fixed later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:44:29 +03:00
Pavel Emelyanov	8bd9f7accf	sstable_directory: Make components_lister and API Now the lister is filesystem-specific. There will soon come another one for S3, so the sstable_directory should be prepared for that by making the lister an abstract class. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:44:29 +03:00
Pavel Emelyanov	5f7f0117e1	sstable_directory: Create components lister based on storage options The directory's lister is storage-specific and should be created differently for different storage options. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:44:29 +03:00
Pavel Emelyanov	950ee0efe8	sstables: Add S3 storage implementation The driver puts all componenets into s3://bucket/uuid/component_name objects where 'bucket' is the keyspace options configuration parameter, and the 'uuid' is the value obtained from the ownership table. E.g. s3://test_bucket/d0a743b0-ad38-11ed-85b5-39b6b0998182/Data.db The life-time is straightforward. Until sealed, the sstable has 'creating' status in the table, then it's updated to be 'sealed'. Prior to removing the objects the status is set to 'deleting' thus allowing the distributed loader to pick up the dangling objects un re-load (not yet implemented). Finally, the entry is deleted from the table. It needs the PR #12648 not to generate empty ks/cf directories on the local filesystem. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:44:29 +03:00
Pavel Emelyanov	08e9046d07	system_keyspace: Add ownership table The schema is CREATE TABLE system.sstables ( location text, generation bigint, format text, status text, uuid uuid, version text, PRIMARY KEY (location, generation) ) A sample entry looks like: location \| generation \| format \| status \| uuid \| version ---------------------------------------------------------------------+------------+--------+--------+--------------------------------------+--------- /data/object_storage_ks/test_table-d096a1e0ad3811ed85b539b6b0998182 \| 2 \| big \| sealed \| d0a743b0-ad38-11ed-85b5-39b6b0998182 \| me The uuid field points to the "folder" on the storage where the sstable components are. Like this: s3 `- test_bucket `- f7548f00-a64d-11ed-865a-0c1fbc116bb3 `- Data.db - Index.db - Filter.db - ... It's not very nice that the whole /var/lib/... path is in fact used as location, it needs the PR #12707 to fix this place. Also, the "status" part is not yet fully functional, it only supports three options: - creating -- the same as TemporaryTOC file exists on disk - sealed -- default state - deleting -- the analogy for the deletion log on disk The latter needs support from the distributed_loader, which's not yet there. In fact, distributes_loader also needs to be patched to actualy select entries from this table on load. Also it needs the mentioned PR #12707 to support staging and quarantine sstables. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:44:28 +03:00
Pavel Emelyanov	e34b86dd61	system_keyspace: Plug to user sstables manager too The sharded<sys_ks> instances are plugged to large data handler and compaction manager to maintain the circular dependency between these components via the interposing database instance. Do the same for user sstables manager, because S3 driver will need to update the local ownership table. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Pavel Emelyanov	4bb885b759	sstable: Make storage instance based on storage options This patch adds storage options lw-ptr to sstables_manager::make_sstable and makes the storage instance creation depend on the options. For local it just creates the filesystem storage instance, for S3 -- throws, but next patch will fix that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Pavel Emelyanov	df026e2cb5	sstable_directory: Keep storage_options aboard The class in question will need to know the table's storage it will need to list sstables from. For that -- construct it with the storage options taken from table. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Pavel Emelyanov	c060f3a52f	sstable: Virtualize the helper that gets on-disk stats for sstable When opening an existing (or just sealed) sstable its components are stat()-ed to get the on-disk sizes and a bit more. Stat-ing a file by name on S3 is not (yet) implemented and doing it file-by-file can be quite terrible. So add a method to return sstable stats in a storage-specific manner. For S3 this can be implemented by getting the info from the ownership table (in the future). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Pavel Emelyanov	0ddd27cb29	sstable, storage: Virtualize data sink making for small components This time sstable needs to create a data sink for a component without having the file at hand. That's pretty much the same as in previous patch, but the mathod declaration differs slightly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Pavel Emelyanov	ac1e56c9d9	sstable, storage: Virtualize data sink making for Data and Index Add the make_data_or_index_sink() virtual method and its implementation for filesystem_storage. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Pavel Emelyanov	1d4fcce5dd	sstable/writer: Shuffle writer::init_file_writers() The method needs to create two data sinks -- for Data and for Index files -- and then wrap it with more stuff (compression, checksums, streams, etc.). With S3 backend using file-output-stream won't work, becase S3 storage cannot provide writable file API (it has data_sink instead). This patch extracts file_data_sink creation so that it could be virtualized with storage API later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Pavel Emelyanov	525a261a4e	sstable: Make storage an API Currently sstable carries a filesystem_storage instance on board. Next patches will make it possible to use some other storage with different data accessing methods. This patch makes sstable carry abstract storage interface and make the existing filesystem_storage implement it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Pavel Emelyanov	033fa107f8	utils: Add S3 readable file impl for random reads Sometimes an sstable is used for random read, sometimes -- for streamed read using the input stream. For both cases the file API can be provided, because S3 API allows random reads of arbitrary lengths. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Pavel Emelyanov	a4a64149a6	utils: Add S3 data sink for multipart upload Putting a large object into S3 using plain PUT is bad choice -- one need to collect the whole object in memory, then send it as a content-length request with plain body. Less memory stress is by using multipart upload, but multipart upload has its limitation -- each part should be at least 5Mb in size. For that reason using file API doesn't work -- file IO API operates with external memory buffers and the file impl would only have raw pointers to it. In order to collect 5Mb of chunk in RAM the impl would have to copy the memory which is not good. Unlike the file API data_sink API is more flexible, as it has temporary buffers at hand and can cache them in zero-copy manner. Having sad that, the S3 data_sink implementation is like this: * put(buffer): move the buffer into local cache, once the local cache grows above 5Mb send out the part * flush: send out whatever is in cache, then send upload completion request * close: check that the upload finihsed (in flush), abort the upload otherwise User of the API may (actually should) wrap the sink with output_stream and use it as any other output_stream. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Pavel Emelyanov	3745b5c715	utils: Add S3 client with basic ops Those include -- HEAD to get size, PUT to upload object in one go, GET to read the object as contigious buffer and DELETE to drop one. The client uses http client from seastar and just implements the S3 protocol using it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Pavel Emelyanov	ced8a07d09	cql-pytest: Add option to run scylla over stable directory The facilities in run.py script allow launching scylla over temporary directory, waiting for it to come alive, killing, etc. The limitation of those is that the work-dir create for scylla is tighly coupled with its pid. The object-storage test in next patches will need to check that the sstables are preserved on scylla restart and this hard binding of workdir to pid won't work. This patch generalizes the scylla run/abort helpers to accept an external directory to work on and adds a call to restart scylla process over existing directory. And one small related change here -- log file is opened in O_APPEND mode so that restarted scylla process continues writing into the old file. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Pavel Emelyanov	6dbe41d277	test.py: Equip it with minio server When test.py starts it activates a minio server inside test-dir and configures an anonymous bucket for test cases to run on Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Pavel Emelyanov	93c8b4b46b	sstables: Detach write_toc() helper When sstable is opened it generates a certain content into TOC file. In filesystem storage this first gets into TemporaryTOC one. Future S3 driver will need the same to put into TOC object. Not to produce duplicate code detach the content generation into a helper. Next patches will make use of it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:00 +03:00
Raphael S. Carvalho	01466be7b9	sstables: Store raw token into summary entries Scylla stores a dht::token into each summary entry, for convenience. But that costs us 16 bytes for each summary entry. That's because dht::token has a kind field in addition to data, both 64 bits. With 1kk partitions, each averaging 4k bytes, summary may end up with ~90k summary entries. So dht::token only will add ~1.5M to the memory footprint of summary. We know summary samples index keys, therefore all tokens in all summary entries cannot have any token kind other than 'key'. Therefore, we can save 8 bytes for each summary entry by storing a 64-bit raw token and converting it back into token whenever needed. Memory footprint of summary entries in a summary goes from sizeof(summary_entry) * entries.size(): 1771520 to sizeof(summary_entry) * entries.size(): 1417216 which is explained by the 8 bytes reduction per summary entry. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-04-10 10:26:04 -03:00
Raphael S. Carvalho	6b5cd9ac7b	sstables: Don't store token data into summary's memory pool summary has a memory pool, which is implemented as a set of contiguous buffer of exponentially increasing size, with the max size of 128k. This pool served for both storing keys of summary entries and their respective tokens. The summary entry itself just stores a string_view, which points to the actual data in the memory pool. Since this series `31593e1451`, which removed token_view, summary_entry stores the actual token, not just the view. Therefore, memory is being wasted, as SSTable loader / writer is unnecessarily storing the token data into the pool. With 11k summary entries, the footprint drops from 756004 to 624932. A 18% reduction. Of course, the reduction depends on factors like key size, where the key size can outweigh significantly this waste. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-04-10 09:59:11 -03:00
Tomasz Grabiec	64a87f4257	Merge 'Standardize node ops sync_nodes selection' from Benny Halevy Use token_metadata get_endpoint_to_host_id_map_for_reading to get all normal token owners for all node operations, rather than using gossip for some operation and token_metadata for others. Fixes #12862 Closes #13256 * github.com:scylladb/scylladb: storage_service: node ops: standardize sync_nodes selection storage_service: get_ignore_dead_nodes_for_replace: make static and rename to parse_node_list	2023-04-10 13:14:55 +02:00
Benny Halevy	cc42f00232	view: view_builder: start: demote sleep_aborted log error This is not really an error, so print it in debug log_level rather than error log_level. Fixes #13374 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #13462	2023-04-09 22:49:06 +03:00
Nadav Har'El	d26bb8c12d	Merge 'tree: migrate from std::regex to boost::regex' from Botond Dénes Except for where usage of `std::regex` is required by 3rd party library interfaces. As demonstrated countless times, std::regex's practice of using recursion for pattern matching can result in stack overflow, especially on AARCH64. The most recent incident happened after merging https://github.com/scylladb/scylladb/pull/13075, which (indirectly) uses `sstables::make_entry_descriptor()` to test whether a certain path is a valid scylla table path in a trial-and-error manner. This resulted in stacks blowing up in AARCH64. To prevent this, use the already tried and tested method of switching from `std::regex` to `boost::regex`. Don't wait until each of the `std::regex` sites explode, replace them all preemptively. Refs: https://github.com/scylladb/scylladb/issues/13404 Closes #13452 * github.com:scylladb/scylladb: test: s/std::regex/boost::regex/ utils: s/std::regex/boost::regex/ db/commitlog: s/std::regex/boost::regex/ types: s/std::regex/boost::regex/ index: s/std::regex/boost::regex/ duration.cc: s/std::regex/boost::regex/ cql3: s/std::regex/boost::regex/ thrift: s/std::regex/boost::regex/ sstables: use s/std::regex/boost::regex/	2023-04-09 18:47:41 +03:00
Kefu Chai	7a05cc3a06	thrift: initiaize _config first to avoid dangling reference in `c642ca9e73`, a reference to the a parameter `config` passed to the `thrift_server` 's constructor is passed down to `create_handler_factory()`, which keeps it so it can create connection handler on demand. but unfortunately, - the `config` parameter is a temporary variable - the `config` parameter is moved away in the constructor after `create_handler_factory()` is called hence we have a dangling reference when the factory created by `create_handler_factory()` tries to deference the reference when handling a new incoming connection. in this change, - the definitions of `_config` and `_handler_factory` member variables are transposed, so that the former is initialized first. - `_handler_factory` now keeps a reference to `_config`'s member variable, so that the weak reference it holds is always valid. Fixes #13455 Branches: none Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13456	2023-04-09 11:34:34 +03:00
Amnon Heiman	928727a57d	main: Load metrics relabel config from a file if it exists This patch reads the relabel config from a file if it exists. A problem with the file or metrics would stop Scylla from starting. This is on purpose, as it's a configuration problem that should be addressed. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2023-04-09 09:10:07 +03:00
Amnon Heiman	990545f616	Add relabel from file support. This patch adds a configuration with an optional file name for relabeling metrics. It also adds a function that accepts a file name and loads the relabel config from a file. An example for such a file: ``` $cat conf.yml relabel_configs: - source_labels: [shard] action: drop target_label: shard regex: (2) - source_labels: [shard] action: replace target_label: level replacement: $1 regex: (.*3) ``` update_relabel_config_from_file throws an exception on failure, it's up to the caller to decide what to do in such cases.	2023-04-09 09:10:02 +03:00
Kefu Chai	9d5fbe226e	auth: remove unused operator<<(.., resource_kind) since the only user of operator<<(..., resource_kind) is now `auth_resource_test`, let's just move it into this test. and there is no need to keep this operator in the header file where `resource_kind` is defined. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-07 20:32:28 +08:00
Kefu Chai	ca50a8d6c7	auth: specialize fmt::formatter<resource_kind> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `auth::resource_kind` without the help of fmt::ostream. its `operator<<(ostream,..)` is reimplemented using fmtlib accordingly to ease the review. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-07 18:59:13 +08:00
Kefu Chai	ca0ca92e68	auth: remove unused operator<<(.., authentication_option) since we already have fmt::formatter<authentication_option>, and there is no exiting users of `operator<<(ostream&, authentication_option)`, let's just drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-07 18:15:35 +08:00
Kefu Chai	ba0f9036ec	auth: specialize fmt::formatter<authentication_option> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `auth::auth_authentication_options` without the help of fmt::ostream. its `operator<<(ostream,..)` is reimplemented using fmtlib accordingly to ease the review. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-07 18:15:25 +08:00
Botond Dénes	452cb1a712	test: s/std::regex/boost::regex/ The former is prone to producing stack-overflow as it uses recursion in it match implementation. The migration is entirely mechanical.	2023-04-06 09:51:32 -04:00
Botond Dénes	985e33a768	utils: s/std::regex/boost::regex/ The former is prone to producing stack-overflow as it uses recursion in it match implementation. The migration is entirely mechanical.	2023-04-06 09:51:28 -04:00
Botond Dénes	52e66e38e7	db/commitlog: s/std::regex/boost::regex/ The former is prone to producing stack-overflow as it uses recursion in it match implementation. The migration is entirely mechanical.	2023-04-06 09:51:24 -04:00
Botond Dénes	712889c99f	types: s/std::regex/boost::regex/ The former is prone to producing stack-overflow as it uses recursion in it match implementation. The migration is entirely mechanical is for the most part. escape() needs some special treatment, looks like boost::regex wants double escaped bacspace.	2023-04-06 09:50:45 -04:00
Botond Dénes	cf188f40b9	index: s/std::regex/boost::regex/ The former is prone to producing stack-overflow as it uses recursion in it match implementation. The migration is entirely mechanical.	2023-04-06 09:50:41 -04:00
Botond Dénes	4a0188ea6a	duration.cc: s/std::regex/boost::regex/ The former is prone to producing stack-overflow as it uses recursion in it match implementation. The migration is entirely mechanical.	2023-04-06 09:50:37 -04:00
Botond Dénes	de402878e4	cql3: s/std::regex/boost::regex/ The former is prone to producing stack-overflow as it uses recursion in it match implementation. The migration is entirely mechanical.	2023-04-06 09:50:32 -04:00
Botond Dénes	c0b72f70d4	thrift: s/std::regex/boost::regex/ The former is prone to producing stack-overflow as it uses recursion in it match implementation. The migration is entirely mechanical.	2023-04-06 09:50:27 -04:00
Botond Dénes	ba031ad181	sstables: use s/std::regex/boost::regex/ The former is prone to producing stack-overflow as it uses recursion in it match implementation. The migration is entirely mechanical.	2023-04-06 09:50:12 -04:00
Botond Dénes	c65bd01174	Merge 'Debloat system_keyspace.hh (and a bit of .cc)' from Pavel Emelyanov The system_keyspace.hh now includes raft stuff, topology changes stuff, task_manager stuff, etc. It's going to include tablets.hh (but maybe not). Anything that deals with system keyspace, and includes system_keyspace.hh, would transitively pull these too. This header is becoming a central hub for all the features. This PR removes all the headers from system_keyspace.hh that correspond to other "subsystems" keeping only generic mutations/querying and seastar ones. Closes #13450 * github.com:scylladb/scylladb: system_keyspace.hh: Remove unneeded headers system_keyspace: Move topology_mutation_builder to storage_service system_keyspace: Move group0_upgrade_state conversions to group0 code	2023-04-06 16:39:20 +03:00
Kamil Braun	c2a2996c2b	docs: cleaning up after failed membership change After a failed topology operation, like bootstrap / decommission / removenode, the cluster might contain a garbage entry in either token ring or group 0. This entry can be cleaned-up by executing removenode on any other node, pointing to the node that failed to bootstrap or leave the cluster. Document this procedure, including a method of finding the host ID of a garbage entry. Add references in other documents. Fixes: #13122 Closes #13186	2023-04-06 13:48:37 +02:00
Botond Dénes	0a46a574e6	Merge 'Topology: introduce nodes' from Benny Halevy As a first step towards using host_id to identify nodes instead of ip addresses this series introduces a node abstraction, kept in topology, indexed by both host_id and endpoint. The revised interface also allows callers to handle cases where nodes are not found in the topology more gracefully by introducing `find_node()` functions that look up nodes by host_id or inet_address and also get a `must_exist` parameter that, if false (the default parameter value) would return nullptr if the node is not found. If true, `find_node` throws an internal error, since this indicates a violation of an internal assumption that the node must exist in the topology. Callers that may handle missing nodes, should use the more permissive flavor and handle the !find_node() case gracefully. Closes #11987 * github.com:scylladb/scylladb: topology: add node state topology: remove dead code locator: add class node topology: rename update_endpoint to add_or_update_endpoint topology: define get_{rack,datacenter} inline shared_token_metadata: mutate_token_metadata: replicate to all shards locator: endpoint_dc_rack: refactor default_location locator: endpoint_dc_rack: define default operator== test: storage_proxy_test: provide valid endpoint_dc_rack	2023-04-06 13:47:22 +03:00
Pavel Emelyanov	18333b4225	system_keyspace.hh: Remove unneeded headers Now this header can replace lots of used types with plain forward declarations Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-06 12:37:00 +03:00
Pavel Emelyanov	1af373cf0a	system_keyspace: Move topology_mutation_builder to storage_service The latter is the only user of the class. This keeps system keyspace code free from unrelated logic and from raft::server_id type. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-06 12:36:02 +03:00
Pavel Emelyanov	45de375126	system_keyspace: Move group0_upgrade_state conversions to group0 code In order to keep system keyspace free from group0 logic and from the service::group0_upgrade_state type Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-06 12:35:07 +03:00
Kefu Chai	0d4ffe1d69	scripts/refresh-submodules.sh: include all commits in summary before this change, we suse `git submodule summary ${submodule}` for collecting the titles of commits in between current HEAD and origin/master. normally, this works just fine. but it fails to collect all commits if the origin/master happens to reference a merge commit. for instance, if we have following history like: 1. merge foo 2. bar 3. foo 4. baz <--- submodule is pointing here. `git submodule summary` would just print out the titles of commits of 1 and 3. so, in this change, instead of relying on `git submodule summary`, we just collect the commits using `git log`. but we preserve the output format used by `git submodule summary` to be consistent with the previous commits bumping up the submodules. please note, in this change instead of matching the output of `git submodule summary`, we use `git merge-base --is-ancestor HEAD origin/master` to check if we are going to create a fastforward change, this is less fragile. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13366	2023-04-06 11:27:14 +03:00
Botond Dénes	9a02315c6b	Merge 'Compaction reevaluation bug fixes' from Raphael "Raph" Carvalho A problem in compaction reevaluation can cause the SSTable set to be left uncompacted for indefinite amount of time, potentially causing space and read amplification to be suboptimal. Two revaluation problems are being fixed, one after off-strategy compaction ended, and another in compaction manager which intends to periodically reevaluate a need for compaction. Fixes https://github.com/scylladb/scylladb/issues/13429. Fixes https://github.com/scylladb/scylladb/issues/13430. Closes #13431 * github.com:scylladb/scylladb: compaction: Make compaction reevaluation actually periodic replica: Reevaluate regular compaction on off-strategy completion	2023-04-05 13:51:21 +03:00
Tomasz Grabiec	9802bb6564	Merge 'Remove explicit flush() from sstable component writer' from Pavel Emelyanov Writing into sstable component output stream should be done with care. In particular -- flushing can happen only once right before closing the stream. Flushing the stream in between several writes is not going to work, because file stream would step on unaligned IO and S3 upload stream would send completion message to the server and would lose any subsequent write. Most of the file_writer users already obey that and flush the writer once right before closing it. The do_write_simple() is extra careful about exceptions handling, but it's an overkill (see first patch). It's better to make file_writer API explicitly lack the ability to flush itself by flushing the stream when closing the writer. Closes #13338 * github.com:scylladb/scylladb: sstables: Move writer flush into close (and remove it) sstables: Relax exception handling in do_write_simple	2023-04-05 12:09:31 +02:00
Tomasz Grabiec	bbabf07f69	Merge 'test/boost/multishard_mutation_query: use random schema' from Botond Dénes This test currently uses `test/lib/test_table.hh` to generate data for its test cases. This data generation facility is used by no other tests. Worse, it is redundant as we already have a random data generator with fixed schema, in `test/lib/mutation_source_test.hh`. So in this series, we migrate the test cases in said test file to random schema and its random data generation facilities. These are used by several other test cases and using random schema allows us to cover a wider (quasi-infinite) number of possibilities. After migrating all tests away from it, `test/lib/test_table.hh` is removed. This series also reduces the runtime of `fuzzy_test` drastically. It should now run in a few minutes or even in seconds (depending on the machine). Fixes: #12944 Closes #12574 * github.com:scylladb/scylladb: test/lib: rm test_table.hh test/boos/multishard_mutation_query_test: migrate other tests to random schema test/boost/multishard_mutation_query_test: use ks keyspace test/boost/multishard_mutation_query_test: improve test pager test/boost/multishard_mutation_query_test: refactor fuzzy_test test/boost: add multishard_mutation_query_test more memory types/user: add get_name() accessor test/lib/random_schema: add create_with_cql() test/lib/random_schema: fix udt handling test/lib/random_schema: type_generator(): also generate frozen types test/lib/random_schema: type_generator(): make static column generation conditional test/lib/random_schema: type_generator(): don't generate duration_type for keys test/lib/random_schema: generate_random_mutations(): add overload with seed test/lib/random_schema: generate_random_mutations(): respect range tombstone count param test/lib/random_schema: generate_random_mutations(): add yields test/lib/random_schema: generate_random_mutations(): fix indentation test/lib/random_schema: generate_random_mutations(): coroutinize method test/lib/random_schema: generate_random_mutations(): expand comment	2023-04-05 10:32:58 +02:00
Michał Chojnowski	df0905357e	mutation_partition_v2: add sentinel to the tracker after adding it to the tree Every tracker insertion has to have a corresponding removal or eviction, (otherwise the number of rows in the tracker will be misaccounted). If we add the row to the tracker before adding it to the tree, and the tree insertion fails (with bad_alloc), this contract will be violated. Fix that. Note: the problem is currently irrelevant because an exception during sentinel insertion will abort the program anyway. Closes #13336	2023-04-05 09:52:44 +02:00
Raphael S. Carvalho	457c772c9c	replica: Make compaction_group responsible for deleting off-strategy compaction input Compaction group is responsible for deleting SSTables of "in-strategy" compactions, i.e. regular, major, cleanup, etc. Both in-strategy and off-strategy compaction have their completion handled using the same compaction group interface, which is compaction_group::table_state::on_compaction_completion(..., sstables::offstrategy offstrategy) So it's important to bring symmetry there, by moving the responsibility of deleting off-strategy input, from manager to group. Another important advantage is that off-strategy deletion is now throttled and gated, allowing for better control, e.g. table waiting for deletion on shutdown. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #13432	2023-04-05 08:37:48 +03:00
Botond Dénes	f7421aab2c	Merge 'cmake: sync with `configure.py` (16/n)' from Kefu Chai this is the 15th changeset of a series which tries to give an overhaul to the CMake building system. this series has two goals: - to enable developer to use CMake for building scylla. so they can use tools (CLion for instance) with CMake integration for better developer experience - to enable us to tweak the dependencies in a simpler way. a well-defined cross module / subsystem dependency is a prerequisite for building this project with the C++20 modules. also, i just found that the scylla executable built with cmake building system segfault in master HEAD. like ``` AddressSanitizer:DEADLYSIGNAL ================================================================= ==3974496==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000000000 bp 0x7ffd48549f70 sp 0x7ffd48549728 T0) ==3974496==Hint: pc points to the zero page. ==3974496==The signal is caused by a READ memory access. ==3974496==Hint: address points to the zero page. #0 0x0 (<unknown module>) #1 0x14e785a5 in wasmtime_runtime::traphandlers::unix::trap_handler::h1f510afc2968497f /home/kefu/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/wasmtime-runtime-5.0.1/src/traphandlers/unix.rs:159:9 #2 0x7f3462e5eb9f (/lib64/libc.so.6+0x3db9f) (BuildId: 6107835fa7d4725691b2b7f6aaee7abe09f493b2) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV (<unknown module>) ==3974496==ABORTING Aborting on shard 0. Backtrace: 0xd16c38a 0x13c5aab0 0x13b9821e 0x13c2fdc7 /lib64/libc.so.6+0x3db9f /lib64/libc.so.6+0x8eb93 /lib64/libc.so.6+0x3daed /lib64/libc.so.6+0x2687e 0xd1e5f8a 0xd1e3d34 0xd1ca059 0xd1c5e29 0xd1c5605 0x14e785a5 /lib64/libc.so.6+0x3db9f ``` decoded: ``` __interceptor_backtrace at ??:? void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /home/kefu/dev/scylladb/seastar/include/seastar/util/backtrace.hh:60 seastar::backtrace_buffer::append_backtrace() at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:778 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:808 seastar::print_with_backtrace(char const, bool) at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:820 (inlined by) seastar::sigabrt_action() at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:3882 (inlined by) operator() at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:3858 (inlined by) __invoke at /home/kefu/dev/scylladb/seastar/src/core/reactor.cc:3854 /lib64/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=6107835fa7d4725691b2b7f6aaee7abe09f493b2, for GNU/Linux 3.2.0, not stripped __GI___sigaction at :? __pthread_kill_implementation at ??:? __GI_raise at :? __GI_abort at :? __sanitizer::Abort() at ??:? __sanitizer::Die() at ??:? __asan::ScopedInErrorReport::~ScopedInErrorReport() at ??:? __asan::ReportDeadlySignal(__sanitizer::SignalContext const&) at ??:? __asan::AsanOnDeadlySignal(int, void, void) at ??:? wasmtime_runtime::traphandlers::unix::trap_handler at /home/kefu/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-7a04d2510079875b/wasmtime-runtime-5.0.1/src/traphandlers/unix.rs:159 __GI___sigaction at :? ``` this led me to this change. but unfortunately, this changeset does not address the segfault. will continue the investigation in my free cycles. Closes #13434 github.com:scylladb/scylladb: build: cmake: include cxx.h with relative path build: cmake: set stack frame limits build: cmake: pass -fvisibility=hidden to compiler build: cmake: use -O0 on aarch64, otherwise -Og	2023-04-05 06:57:23 +03:00
Yaron Kaikov	c80ab78741	doc: update supported os for 2022.1 ubuntu22.04 is already supported on both `5.0` and `2022.1` updating the table Closes #13340	2023-04-05 06:43:58 +03:00
Pavel Emelyanov	f5de0582c8	alternator,util: Move aws4-hmac-sha256 signature generator to util S3 client cannot perform anonymous multipart uploads into any real S3 buckets regardless of their configuration. Since multipart upload is essential part of the sstables backend, we need to implement the authorisation support for the client early. (side note): with minio anonymous multipart upload works, with aws s3 anonymous PUT and DELETE can be configured, it's exactly the combination of aws + multipart upload that does need authorization. Fortunately, the signature generation and signature checking code is symmetrical and we have the checking option already in alternator :) So what this patch does is just moves the alternator::get_signature() helper into utils/. A sad side effect of that is all tests now need to link with gnutls :( that is used to compute the hash value itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13428	2023-04-04 18:24:48 +03:00
Nadav Har'El	aeabfcb93f	Merge 'Revert scylla sstable schema improvements' from Botond Dénes This PR reverts the scylla sstable schema loading improvements as they fail in CI every other run. I am already working on fixes for these but I am not sure I understand all the failures so it is best to revert and re-post the series later. Fixes: #13404 Fixes: #13410 Closes #13419 * github.com:scylladb/scylladb: Revert "Merge 'tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes" Revert "tools/schema_loader: don't require results from optional schema tables"	2023-04-04 18:22:14 +03:00
Anna Stuchlik	447ce58da5	doc: update Raft doc for versions 5.2 and 2023.1 Fixes https://github.com/scylladb/scylladb/issues/13345 Fixes https://github.com/scylladb/scylladb/issues/13421 This commit updates the Raft documentation page to be up to date in versions 5.2 and 2023.1. - Irrelevant information about previous releases is removed. - Some information is clarified. - Mentions of version 5.2 are either removed (if possible) or version 2023.1 is added. Closes #13426	2023-04-04 15:15:56 +02:00
Raphael S. Carvalho	156ac0a67a	compaction: Make compaction reevaluation actually periodic The manager intended to periodically reevaluate compaction need for each registered table. But it's not working as intended. The reevaluation is one-off. This means that compaction was not kicking in later for a table, with low to none write activity, that had expired data 1 hour from now. Also make sure that reevaluation happens within the compaction scheduling group. Fixes #13430. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-04-04 09:16:19 -03:00
Raphael S. Carvalho	2652b41606	replica: Reevaluate regular compaction on off-strategy completion When off-strategy compaction completes, regular compaction is not triggered. If off-strategy output causes the table's SSTable set to not conform the strategy goal, it means that read and space amplification will be suboptimal until the next compaction kicks in, which can take undefinite amount of time (e.g. when active memtable is flushed). Let's reevaluate compaction on main SSTable set when off-strategy ends. Fixes #13429. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-04-04 09:16:16 -03:00
Kefu Chai	dceb364c5c	build: cmake: include cxx.h with relative path before this change, the wasm binding source files includes the cxxbridge header file of `cxx.h` with its full path. to better mirror the behavior of configure.py, let's just include this header file with relative path. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-04 15:33:20 +08:00
Kefu Chai	ecd5bf98d9	build: cmake: set stack frame limits * transpose include(mode.common) and include (mode.${build_mode}), so the former can reference the value defined by the latter. * set stack_usage_threshold for supported build modes. please note, this compiler option (-Wstack-usage=<bytes>) is only supported by GCC so far. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-04 15:33:20 +08:00
Kefu Chai	6cc8800c85	build: cmake: pass -fvisibility=hidden to compiler this mirrors the behavior of `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-04 15:33:20 +08:00
Kefu Chai	066e9567ee	build: cmake: use -O0 on aarch64, otherwise -Og this addresses an oversight in `b234c839e4`, which is supposed to mirror the behavior of `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-04 15:33:20 +08:00
Anna Stuchlik	595325c11b	doc: add upgrade guide from 5.2 to 2023.1 Related: https://github.com/scylladb/scylla-enterprise/issues/2770 This commit adds the upgrade guide from ScyllaDB Open Source 5.2 to ScyllaDB Enterprise 2023.1. This commit does not cover metric updates (the metrics file has no content, which needs to be added in another PR). As this is an upgrade guide, this commit must be merged to master and backported to branch-5.2 and branch-2023.1 in scylla-enterprise.git. Closes #13294	2023-04-04 08:24:00 +03:00
Botond Dénes	8167f11a23	Merge 'Move compaction manager tasks out of compaction manager' from Aleksandra Martyniuk Task manager compaction tasks that cover compaction group compaction need access to compaction_manager::tasks. To avoid circular dependency and be able to rely on forward declaration, task needs to be moved out of compaction manager. To avoid naming confusion compaction_manager::task is renamed. Closes #13226 * github.com:scylladb/scylladb: compaction: use compaction namespace in compaction_manager.cc compaction: rename compaction::task compaction: move compaction_manager::task out of compaction manager compaction: move sstable_task definition to source file	2023-04-03 15:40:42 +03:00
Botond Dénes	54c0a387a2	Revert "Merge 'tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes" This reverts commit `32fff17e19`, reversing changes made to `164afe14ad`. This series proved to be problematic, the new test introduced by it failing quite often. Revert it until the problems are tracked down and fixed.	2023-04-03 13:54:00 +03:00
Botond Dénes	04b1219694	Revert "tools/schema_loader: don't require results from optional schema tables" This reverts commit `c15f53f971`. Said commit is based on a commit which we want to revert because it's unit test if flaky.	2023-04-03 13:53:06 +03:00
Petr Gusev	09636b20f3	scylla_cluster.py: optimize node logs reading There are two occasions in scylla_cluster where we read the node logs, and in both of them we read the entire file in memory. This is not efficient and may cause an OOM. In the first case we need the last line of the log file, so we seek at the end and move backwards looking for a new line symbol. In the second case we look through the log file to find the expected_error. The readlines() method returns a Python list object, which means it reads the entire file in memory. It's sufficient to just remove it since iterating over the file instance already yields lines lazily one by one. This is a follow-up for #13134. Closes #13399	2023-04-03 12:28:08 +02:00
Marcin Maliszkiewicz	99f8d7dcbe	db: view: use deferred_close for closing staging_sstable_reader When consume_in_thread throws the reader should still be closed. Related https://github.com/scylladb/scylla-enterprise/issues/2661 Closes #13398 Refs: scylladb/scylla-enterprise#2661 Fixes: #13413	2023-04-03 09:02:55 +03:00
Botond Dénes	ca062d1fba	Merge ' mutation: replace operator<<(..) with fmt formatter' from Kefu Chai this is a part of a series migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `position_in_partition` and `partition_region` without using ostream<<. also, this change removes `operator<<(ostream, const position_in_partition_view&)` , `operator<<(ostream, const partition_region&)` along with their callers. Refs #13245 Closes #13391 * github.com:scylladb/scylladb: mutation: drop operator<< for position_in_partition and friends partition_snapshot_row_cursor: do not use operator<< when printing position mutation: specialize fmt::formatter<position_in_partition> mutation: specialize fmt::formatter<partition_region>	2023-04-03 08:34:55 +03:00
Kefu Chai	6c37829224	wasm: add noexcept specifier for alien::run_on() as alien::run_on() requires the function to be noexcept, let's make this explicit. also, this paves the road to the type constraint added to `alien::run_on()`. the type contraint will enforce this requirement to the function passed to `alien::run_on()`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13375	2023-04-03 08:19:00 +03:00
Botond Dénes	36e53d571c	Merge 'Treewide use-after-move bug fixes' from Raphael "Raph" Carvalho That's courtersy of `153813d3b8`, which annotates Seastar smart pointer classes with Clang's consumed attributes, to help Clang to statically spot use-after-move bugs. Closes #13386 * github.com:scylladb/scylladb: replica: Fix use-after-move in table::make_streaming_reader index/built_indexes_virtual_reader.hh: Fix use-after-move db/view/build_progress_virtual_reader: Fix use-after-move sstables: Fix use-after-move when making reader in reverse mode	2023-04-03 06:57:54 +03:00
Benny Halevy	c17df1759e	topology: add node state Add a simple node state model with: `joining`, `normal`, `leaving`, and `left` states to help managing nodes during replace with the the same ip address. Later on, this could also help prevent nodes that were decommissioned, removed, or replaced from rejoining the cluster. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:18:31 +03:00
Benny Halevy	027f188a97	topology: remove dead code Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:13:04 +03:00
Benny Halevy	f3d5df5448	locator: add class node And keep per node information (idx, host_id, endpoint, dc_rack, is_pending) in node objects, indexed by topology on several indices like: idx, host_id, endpoint, current/pending, per dc, per dc/rack. The node index is a shorthand identifier for the node. node* and index are valid while the respective topology instance is valid. To be used, the caller must hold on to the topology / token_metadata object (e.g. via a token_metadata_ptr or effective_replication_map) Refs #6403 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> topology: add node idx Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:13:02 +03:00
Benny Halevy	006e02410f	topology: rename update_endpoint to add_or_update_endpoint To reflect what it does, Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:08:03 +03:00
Benny Halevy	df1c92649e	topology: define get_{rack,datacenter} inline Define get_location() that gets the location for the local node, and use either this entry point or get_location(inet_address) to get the respective dc or rack. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:07:49 +03:00
Benny Halevy	fd1a2591b5	shared_token_metadata: mutate_token_metadata: replicate to all shards storage_service::replicate_to_all_cores has a sophisticated way to mutate the token_metadata and effective_replication_map on shard 0 and cloning those to all other shards, applying the changes only mutate and clone succeeded on all shards so we don't end up with only some of the shards with the mutated copy if an error happend mid-way (and then we would need to roll-back the change for exception safety). shared_token_metadata::mutate_token_metadata is currently only called from a unit test that needs to mutate the token metadata only on shard 0, but a following patch will require doing that on all shards. This change adds this capbility by enforcing the call to be on shard 0m mutating the token_metdata into a temporary pending copy and cloning it on all other shards. Only then, when all shard succeeded, set the modified token_metadata on all shards. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:07:17 +03:00
Benny Halevy	9cce01a12c	locator: endpoint_dc_rack: refactor default_location Refactor the thread_local default_location out of topology::get_location so it can be used elsewhere. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:06:53 +03:00
Benny Halevy	5ba5371631	locator: endpoint_dc_rack: define default operator== and get rid of the ad-hoc implementation in network_topology_strategy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:06:52 +03:00
Benny Halevy	5874a0d0ca	test: storage_proxy_test: provide valid endpoint_dc_rack Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 19:13:05 +03:00
Benny Halevy	ca61d88764	storage_service: node ops: standardize sync_nodes selection Use token_metadata get_endpoint_to_host_id_map_for_reading to get all normal token owners for all node operations, rather than using gossip for some operation and token_metadata for others. Fixes #12862 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 09:17:07 +03:00
Raphael S. Carvalho	d2d151ae5b	Fix use-after-move when initializing row cache with dummy entry Courtersy of clang-tidy: row_cache.cc:1191:28: warning: 'entry' used after it was moved [bugprone-use-after-move] _partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{_schema}); ^ row_cache.cc:1191:60: note: move occurred here _partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{_schema}); ^ row_cache.cc:1191:28: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated _partitions.insert(entry.position().token().raw(), std::move(entry), dht::ring_position_comparator{*_schema}); The use-after-move is UB, as for it to happen, depends on evaluation order. We haven't hit it yet as clang is left-to-right. Fixes #13400. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #13401	2023-03-31 19:46:53 +03:00
Botond Dénes	c15f53f971	tools/schema_loader: don't require results from optional schema tables When loading a schema from disk, only the `tables` and `columns` tables are required to have an entry to the loaded schema. All the others are optional. Yet the schema loader expects all the tables to have a corresponding entry, which leads to errors when trying to load a schema which doesn't. Relax the loader to only require existing entries in the two mandatory tables and not the others. Closes #13393	2023-03-31 16:35:42 +02:00
Kefu Chai	c24a9600af	docs: dev: correct a typo s/By expending/By expanding/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13392	2023-03-31 17:19:08 +03:00
Raphael S. Carvalho	04932a66d3	replica: Fix use-after-move in table::make_streaming_reader Variant used by streaming/stream_transfer_task.cc: , reader(cf.make_streaming_reader(cf.schema(), std::move(permit_), prs)) as full slice is retrieved after schema is moved (clang evaluates left-to-right), the stream transfer task can be potentially working on a stale slice for a particular set of partitions. static report: In file included from replica/dirty_memory_manager.cc:6: replica/database.hh:706:83: error: invalid invocation of method 'operator->' on object 'schema' while it is in the 'consumed' state [-Werror,-Wconsumed] return make_streaming_reader(std::move(schema), std::move(permit), range, schema->full_slice()); Fixes #13397. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-31 08:44:46 -03:00
Raphael S. Carvalho	f8df3c72d4	index/built_indexes_virtual_reader.hh: Fix use-after-move static report: ./index/built_indexes_virtual_reader.hh:228:40: warning: invalid invocation of method 'operator->' on object 's' while it is in the 'consumed' state [-Wconsumed] _db.find_column_family(s->ks_name(), system_keyspace::v3::BUILT_VIEWS), Fixes #13396. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-31 08:41:44 -03:00
Raphael S. Carvalho	1ecba373d6	db/view/build_progress_virtual_reader: Fix use-after-move use-after-free in ctor, which potentially leads to a failure when locating table from moved schema object. static report In file included from db/system_keyspace.cc:51: ./db/view/build_progress_virtual_reader.hh:202:40: warning: invalid invocation of method 'operator->' on object 's' while it is in the 'consumed' state [-Wconsumed] _db.find_column_family(s->ks_name(), system_keyspace::v3::SCYLLA_VIEWS_BUILDS_IN_PROGRESS), Fixes #13395. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-31 08:40:30 -03:00
Raphael S. Carvalho	213eaab246	sstables: Fix use-after-move when making reader in reverse mode static report: sstables/mx/reader.cc:1705:58: error: invalid invocation of method 'operator' on object 'schema' while it is in the 'consumed' state [-Werror,-Wconsumed] legacy_reverse_slice_to_native_reverse_slice(schema, slice.get()), pc, std::move(trace_state), fwd, fwd_mr, monitor); Fixes #13394. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-31 08:39:11 -03:00
Kefu Chai	6e956c5358	mutation: drop operator<< for position_in_partition and friends now that all their callers are removed, let's just drop these operators. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-31 19:03:14 +08:00
Kefu Chai	76dde9fd50	partition_snapshot_row_cursor: do not use operator<< when printing position in order to prepare for dropping the `operator<<()` for `position_in_partition_view`, let's use fmtlib to print `position()`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-31 19:03:14 +08:00
Kefu Chai	4ec4859179	mutation: specialize fmt::formatter<position_in_partition> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print - position_in_partition - position_in_partition_view - position_in_partition_view::printer without the help of fmt::ostream. their `operator<<(ostream,..)` are reimplemented using fmtlib accordingly to ease the review. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-31 19:03:14 +08:00
Kefu Chai	500eeeb12c	mutation: specialize fmt::formatter<partition_region> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `partition_region` with the help of fmt::ostream. to help with the review process, the corresponding `to_string()` is dropped, and its callers now switch over to `fmt::to_string()` in this change as well. to use `fmt::to_string()` helps with consolidating all places to use fmtlib for printing/formatting. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-31 19:03:14 +08:00
Tomasz Grabiec	99cb948eac	direct_failure_detector: Avoid throwing exceptions in the success path sleep_abortable() is aborted on success, which causes sleep_aborted exception to be thrown. This causes scylla to throw every 100ms for each pinged node. Throwing may reduce performance if happens often. Also, it spams the logs if --logger-log-level exception=trace is enabled. Avoid by swallowing the exception on cancellation. Fixes #13278. Closes #13279	2023-03-31 12:40:43 +02:00
Alejo Sanchez	81b40c10de	test/pylib: RandomTables.add_column with value column When adding extra columns in a test, make them value column. Name them with the "v_" prefix and use the value column number counter. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13271	2023-03-31 11:19:49 +02:00
Alejo Sanchez	e3b462507d	test/pylib: topology: support clusters of initial size 0 To allow tests with custom clusters, allow configuration of initial cluster size of 0. Add a proof-of-concept test to be removed later. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13342	2023-03-31 11:17:58 +02:00
Benny Halevy	56be654edc	storage_service: get_ignore_dead_nodes_for_replace: make static and rename to parse_node_list Let the caller pass the string to parse to the function rather than the function itself get to it via _db.local().get_config() so it could be used as a general purpose function. Make it static now that it doesn't require an instance. Rename to `parse_node_list` as that's what the function does. It doesn't care if the nodes are to be ignored or something else (e.g. removed), they only need to be in token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-31 10:20:17 +03:00
Kefu Chai	e107b31d23	test: sstable: remove unused class in sstable test generation_for_sharded_test is not used by any of these sstable tests, so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13388	2023-03-31 08:02:22 +03:00
Botond Dénes	f777916055	Merge 'Offstrategy keyspace compaction task' from Aleksandra Martyniuk Task manager task implementations of classes that cover offstrategy keyspace compaction which can be start through /storage_service/keyspace_compaction/ api. Top level task covers the whole compaction and creates child tasks on each shard. Closes #12713 * github.com:scylladb/scylladb: test: extend test_compaction_task.py to test offstrategy compaction compaction: create task manager's task for offstrategy keyspace compaction on one shard compaction: create task manager's task for offstrategy keyspace compaction compaction: create offstrategy_compaction_task_impl	2023-03-31 07:09:17 +03:00
Pavel Emelyanov	7d6ab5c84d	code: Remove some headers from query_processor.hh The forward_service.hh and raft_group0_client.hh can be replaced with forward declarations. Few other files need their previously indirectly included headers back. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13384	2023-03-31 07:08:41 +03:00
Tomasz Grabiec	4d6443e030	Merge 'Schema commitlog separate dir' from Gusev Petr The commitlog api originally implied that the commitlog_directory would contain files from a single commitlog instance. This is checked in segment_manager::list_descriptors, if it encounters a file with an unknown prefix, an exception occurs in `commitlog::descriptor::descriptor`, which is logged with the `WARN` level. A new schema commitlog was added recently, which shares the filesystem directory with the main commitlog. This causes warnings to be emitted on each boot. This patch solves the warnings problem by moving the schema commitlog to a separate directory. In addition, the user can employ the new `schema_commitlog_directory` parameter to move the schema commitlog to another disk drive. This is expected to be released in 5.3. As #13134 (raft tables->schema commitlog) is also scheduled for 5.3, and it already requires a clean rolling restart (no cl segments to replay), we don't need to specifically handle upgrade here. Fixes: #11867 Closes #13263 * github.com:scylladb/scylladb: commitlog: use separate directory for schema commitlog schema commitlog: fix commitlog_total_space_in_mb initialization	2023-03-30 23:48:58 +02:00
Petr Gusev	0152c000bb	commitlog: use separate directory for schema commitlog The commitlog api originally implied that the commitlog_directory would contain files from a single commitlog instance. This is checked in segment_manager::list_descriptors, if it encounters a file with an unknown prefix, an exception occurs in commitlog::descriptor::descriptor, which is logged with the WARN level. A new schema commitlog was added recently, which shares the filesystem directory with the main commitlog. This causes warnings to be emitted on each boot. This patch solves the warnings problem by moving the schema commitlog to a separate directory. In addition, the user can employ the new schema_commitlog_directory parameter to move the schema commitlog to another disk drive. By default, the schema commitlog directory is nested in the commitlog_directory. This can help avoid problems during an upgrade if the commitlog_directory in the custom scylla.yaml is located on a separate disk partition. This is expected to be released in 5.3. As #13134 (raft tables->schema commitlog) is also scheduled for 5.3, and it already requires a clean rolling restart (no cl segments to replay), we don't need to specifically handle upgrade here. Fixes: #11867	2023-03-30 21:55:50 +04:00
Petr Gusev	f31bd26971	schema commitlog: fix commitlog_total_space_in_mb initialization It seems there was a typo here, which caused commitlog_total_space_in_mb to always be zero and the schema commitlog to be effectively unlimited in size.	2023-03-30 21:55:50 +04:00
Botond Dénes	207dcbb8fa	Merge 'sstables: prepare for uuid-based generation_type' from Benny Halevy Preparing for #10459, this series defines sstables::generation_type::int_t as `int64_t` at the moment and use that instead of naked `int64_t` variables so it can be changed in the future to hold e.g. a `std::variant<int64_t, sstables::generation_id>`. sstables::new_generation was defined to generation new, unique generations. Currently it is based on incrementing a counter, but it can be extended in the future to manufacture UUIDs. The unit tests are cleaned up in this series to minimize their dependency on numeric generations. Basically, they should be used for loading sstables with hard coded generation numbers stored under `test/resource/sstables`. For all the rest, the tests should use existing and mechanisms introduced in this series such as generation_factory, sst_factory and smart make_sstable methods in sstable_test_env and table_for_tests to generate new sstables with a unique generation, and use the abstract sst->generation() method to get their generation if needed, without resorting the the actual value it may hold. Closes #12994 * github.com:scylladb/scylladb: everywhere: use sstables::generation_type test: sstable_test_env: use make_new_generation sstable_directory::components_lister::process: fixup indentation sstables: make highest_generation_seen return optional generation replica: table: add make_new_generation function replica: table: move sstable generation related functions out of line test: sstables: use generation_type::int_t sstables: generation_type: define int_t	2023-03-30 17:05:07 +03:00
Pavel Emelyanov	92318fdeae	Merge 'Initialize Wasm together with query_processor' from Wojciech Mitros The wasm engine is moved from replica::database to the query_processor. The wasm instance cache and compilation thread runner were already there, but now they're also initialized in the query_processor constructor. By moving the initialization to the constructor, we can now be certain that all wasm-related objects (wasm instance cache, compilation thread runner, and wasm engine, which was already passed in the constructor) are initialized when we try to use them because we have to use the query processor to access them anyway. The change is also motivated by the fact that we're planning to take Wasm UDFs out of experimental, after which they should stop getting special treatment. Closes #13311 * github.com:scylladb/scylladb: wasm: move wasm initialization to query_processor constructor wasm: return wasm instance cache as a reference instead of a pointer wasm: move wasm engine to query_processor	2023-03-30 14:30:23 +03:00
Nadav Har'El	59ab9aac44	Merge 'functions: reframe aggregate functions in terms of scalar functions' from Avi Kivity Currently, aggregate functions are implemented in a statefull manner. The accumulator is stored internally in an aggregate_function::aggregate, requiring each query to instantiate new instances (see aggregate_function_selector's constructor, and note how it's called from selector::new_instance()). This makes aggregates hard to use in expressions, since expressions are stateless (with state only provided to evaluate()). To facilitate migration towards stateless expressions, we define a stateless_aggregate_function (modeled after user-defined aggregates, which are already stateless). This new struct defines the aggregate in terms of three scalar functions: one to aggregate a new input into an accumulator (provided in the first parameter), one to finalize an accumulator into a result, and one to reduce two accumulators for parallelized aggregation. All existing native aggregate functions are converted to the new model, and the old interface is removed. This series does not yet convert selectors to expressions, but it does remove one of the obstacles. Performance evaluation: I created a table with a million ints on a single-node cluster, and ran the avg() function on them. I measured the number of instructions executed with `perf stat -p $(pgrep scylla) -e instructions` while the query was running. The query executed from cache, memtables were flushed beforehand. The instruction count per row increased from roughly 49k to roughly 52k, indicating 3k extra instructions per row. While 3k instructions to execute a function is huge, it is currently dwarfed by other overhead (and will be even less important in a cluster where it CL>1 will cause non-coordinator code to run multiple times). Closes #13105 * github.com:scylladb/scylladb: cql3/selection, forward_service: use use stateless_aggregate_function directly db: functions: fold stateless_aggregate_function_adapter into aggregate_function cql3: functions: simplify accumulator_for template cql3: functions: base user-defined aggregates on stateless aggregates cql3: functions: drop native_aggregate_function cql3: functions: reimplement count(column) statelessly cql3: functions: reimplement avg() statelessly cql3: functions: reimplement sum() statelessly cql3: functions: change wide accumulator type to varint cql3: functions: unreverse types for min/max cql3: functions: rename make_{min,max}_dynamic_function cql3: functions: reimplement min/max statelessly cql3: functions: reimplement count(*) statelessly cql3: functions: simplify creating native functions even more cql3: functions: add helpers for automating marshalling for scalar functions types: fix big_decimal constructor from literal 0 cql3: functions: add helper class for internal scalar functions db: functions: add stateless aggregate functions db, cql3: move scalar_function from cql3/functions to db/functions	2023-03-30 13:58:47 +03:00
Aleksandra Martyniuk	306d44568f	test: extend test_compaction_task.py to test offstrategy compaction	2023-03-30 10:52:27 +02:00
Aleksandra Martyniuk	8afa54d4f6	compaction: create task manager's task for offstrategy keyspace compaction on one shard Implementation of task_manager's task that covers local offstrategy keyspace compaction.	2023-03-30 10:49:09 +02:00
Aleksandra Martyniuk	73860b7c9d	compaction: create task manager's task for offstrategy keyspace compaction Implementation of task_manager's task covering offstrategy keyspace compaction that can be started through storage_service api.	2023-03-30 10:44:56 +02:00
Aleksandra Martyniuk	e8ef8a51d5	compaction: create offstrategy_compaction_task_impl offstrategy_compaction_task_impl serves as a base class of all concrete offstrategy compaction task classes.	2023-03-30 10:28:17 +02:00
Nadav Har'El	32fff17e19	Merge 'tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes `scylla-sstable` currently has two ways to obtain the schema: * via a `schema.cql` file. * load schema definition from memory (only works for system tables). This meant that for most cases it was necessary to export the schema into a `CQL` format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable is inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file. This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a `schema.cql` is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override. If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong. A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes. This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change. Example: ``` $ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db {"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}} ``` As seen above, subdirectories like `qurantine`, `staging` etc are also supported. Fixes: https://github.com/scylladb/scylladb/issues/10126 Closes #13075 * github.com:scylladb/scylladb: docs/operating-scylla/admin-tools: scylla-sstable.rst: update schema section test/cql-pytest: test_tools.py: add test for schema loading test/cql-pytest: nodetool.py: add flush_keyspace() tools/scylla-sstable: reform schema loading mechanism tools/schema_loader: add load_schema_from_schema_tables() db/schema_tables: expose types schema	2023-03-30 09:35:59 +03:00
Pavel Emelyanov	886a1392a8	sstables: Move writer flush into close (and remove it) Writing into sstable component output stream should be done with care. In particular -- flushing can happen only once right before closing the stream. Flushing the stream in between several writes is not going to work, because file stream would step on unaligned IO and S3 upload stream would send completion message to the server and would lose any subsequent write. Having said that, it's better to remove the flush() ability from the component writer not to tempt the developers. refs: #13320 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-30 09:34:04 +03:00
Pavel Emelyanov	77169e2647	sstables: Relax exception handling in do_write_simple This effectively reverts `000514e7cc` (sstable: close file_writer if an exception in thrown) because it became obsoleted by `60873d2360` (sstable: file_writer: auto-close in destructor). The change is in fact idempotent. Before the patch writer was closed regardless of write/flush failing or not. After the patch writer will close itself in destrictor for sure. Before the patch an exception from write/flush was caught, then close was called and regardless of close failed or not the former exception was re-thrown. After the patch an exception from write/flush will result inin writer destruction that would ignore close exception (if any). Before the patch throwing close after successfull write/flush re-threw the close exception. After the patch writer will be closed "by hand" and any exception will be reported. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-30 09:32:56 +03:00
Botond Dénes	164afe14ad	Merge 'compound_compat: replace operator<<(..) with fmt formatter ' from Kefu Chai this is a part of a series migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `composite` and `composite_view` without using ostream<<. also, this change removes `operator<<(ostream, const composite&)` , `operator<<(ostream, const composite_view&)` along with their callers. Refs #13245 Closes #13360 * github.com:scylladb/scylladb: compound_compat: remove operator<<(ostream, composite) compound_compat: remove operator<<(ostream, composite_view) sstables: do not use operator<< to print composite_view compound_compat.hh: specialize fmt::formatter<composite> compound_compat.hh: specialize fmt::formatter<composite_view> compound_compat.hh: specialize fmt::formatter<component_view>	2023-03-30 08:47:17 +03:00
Botond Dénes	972b24a969	Merge 'Break the proxy -> database -> [views] -> proxy loop' from Pavel Emelyanov ... and drop usage of global storage proxy from several places of mutate_MV(). This is the last dependency loop around storage proxy left as long as the last user of the global storage proxy. The trouble is that while proxy naturally depends on database, the database SUDDENLY requires proxy to push view updates from the guts of database::do_apply(). Similar loop existed in a form of database -> { large_data_handler, compaction manager } -> system keyspace -> database and it was cut in `917fdb9e53` (Cut database-system_keyspace circular dependency) by introducing a soft dependency link from l. d. handler / compaction manager to system keyspace. The similar solution is proposed here. The database instance gets a soft dependency (shared_ptr) to view_update_generator instance. On start the link is nullptr and pushing view updates is not possible until view_updates_generator starts and plugs itself to the database. The plugging happens naturally, because v.u.generator needs proxy as explicit dependency and, thus, can reach database via proxy. This (seems to) works because tables that need view updates don't start being mutated until late enough, as late as v.u.generator starts. As a nice side effect this allows removing a bunch of global storage proxy usages from mutate_MV() which opens a pretty short way towards de-globalizing proxy (after it only qctx, tracing and schema registry will be left). Closes #13367 * github.com:scylladb/scylladb: view: Drop global storage_proxy usage from mutate_MV() view: Make mutate_MV() method of view_update_generator table: Carry v.u.generator down to populate_views() table: Carry v.u.generator down to do_push_view_replica_updates() view: Keep v.u.generator shared pointer on view_builder::consumer view: Capture v.u.generator on view_updating_consumer lambda view: Plug view update generator to database view: Add view_builder -> view_update_generator dependency view: Add view_update_generator -> sharded<storage_proxy> dependency	2023-03-30 08:29:29 +03:00
Takuya ASADA	160c184d0b	scylla_kernel_check: suppress verbose iotune messages Stop printing verbose iotune messages while the check, just print error message. Fixes #13373. Closes #13362	2023-03-30 07:30:07 +03:00
Pavel Emelyanov	9a66174a94	Merge 'config: make query timeouts live update-able' from Kefu Chai in this change, following query timeouts config options are marked live update-able: - range_request_timeout_in_ms - read_request_timeout_in_ms - counter_write_request_timeout_in_ms - cas_contention_timeout_in_ms - truncate_request_timeout_in_ms - write_request_timeout_in_ms - request_timeout_in_ms as per https://github.com/scylladb/scylladb/issues/10172, > Many users would like to set the driver timers based on server timers. > For example: expire a read timeout before or after the server read time > out. with this change, we are able to set the timeouts on the fly. these timeout options specify how long coordinator waits for the completion of different kinds of operations. but these options are cached by the servers consuming them, so in this series, helpers are added to update the cached values when the options gets modified. also, since the observers are not copyable, sharded_parameter is used to initialize the config when creating these sharded servers. Fixes #12232 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12531 * github.com:scylladb/scylladb: timeout_config: remove unused make_timeout_config() client_state: split the param list of ctor into multi lines redis,thrift,transport: make timeout_config live-updateable config: mark query timeouts live update-able transport: mark cql_server::timeout_config() const auth: remove unused forward declaration redis: drop unused member function transport: drop unused member function thrift: keep a reference of timeout_config in handler_factory redis,thrift,transport: initialize _config with std::move(config) redis,thrift,transport: pass config via sharded_parameter utils: config_file: add a space after `=`	2023-03-29 19:38:26 +03:00
Kefu Chai	4670ba90e5	scripts: remove git-archive-all since we don't build the rpm/deb packages from source tarball anymore, instead we build the rpm/deb packages from precompiled relocatable package. there is no need to keep git-archive-all in the repo. in this change, the git-archive-all script and its license file are removed. they were added for building rpm packages from source tarball in `f87add31a7`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13372	2023-03-29 18:59:23 +03:00
Avi Kivity	472b155d76	Merge 'Allow each compaction group to have its own compaction strategy state' from Raphael "Raph" Carvalho This is important for multiple compaction groups, as they cannot share state that must span a single SSTable set. The solution is about: 1) Decoupling compaction strategy from its state; making compaction_strategy a pure stateless entity 2) Each compaction group storing its own compaction strategy state 3) Compaction group feeds its state into compaction strategy whenever needed Closes #13351 * github.com:scylladb/scylladb: compaction: TWCS: wire up compaction_strategy_state compaction: LCS: wire up compaction_strategy_state compaction: Expose compaction_strategy_state through table_state replica: Add compaction_strategy_state to compaction group compaction: Introduce compaction_strategy_state compaction: add table_state param to compaction_strategy::notify_completion() compaction: LCS: extract state into a separate struct compaction: TWCS: prepare for stateless strategy compaction: TWCS: extract state into a separate struct compaction: add const-qualifier to a few compaction_strategy methods	2023-03-29 18:57:11 +03:00
Pavel Emelyanov	cc262d814b	view: Drop global storage_proxy usage from mutate_MV() Now the mutate_MV is the method of v.u.generator which has reference to the sharded<storage_proxy>. Few helper static wrappers are patched to get the needed proxy or database reference from the mutate_MV call. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-29 18:48:14 +03:00
Pavel Emelyanov	7cabdc54a6	view: Make mutate_MV() method of view_update_generator Nowadays its a static helper, but internally it depends on storage proxy, so it grabs its global instance. Making it a method of view update generator makes it possible to use the proxy dependency from the generator. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-29 18:48:14 +03:00
Pavel Emelyanov	e78e64a920	table: Carry v.u.generator down to populate_views() The method is called by view_builder::consumer when building a view and the consumer already has stable dependency reference on the view updates generator. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-29 18:48:13 +03:00
Botond Dénes	bae62f899d	mutation/mutation_compactor: consume_partition_end(): reset _stop The purpose of `_stop` is to remember whether the consumption of the last partition was interrupted or it was consumed fully. In the former case, the compactor allows retreiving the compaction state for the given partition, so that its compaction can be resumed at a later point in time. Currently, `_stop` is set to `stop_iteration::yes` whenever the return value of any of the `consume()` methods is also `stop_iteration::yes`. Meaning, if the consuming of the partition is interrupted, this is remembered in `_stop`. However, a partition whose consumption was interrupted is not always continued later. Sometimes consumption of a partitions is interrputed because the partition is not interesting and the downstream consumer wants to stop it. In these cases the compactor should not return an engagned optional from `detach_state()`, because there is not state to detach, the state should be thrown away. This was incorrectly handled so far and is fixed in this patch, but overwriting `_stop` in `consume_partition_end()` with whatever the downstream consumer returns. Meaning if they want to skip the partition, then `_stop` is reset to `stop_partition::no` and `detach_state()` will return a disengaged optional as it should in this case. Fixes: #12629 Closes #13365	2023-03-29 17:48:45 +03:00
Aleksandra Martyniuk	0ceee3e4b3	compaction: use compaction namespace in compaction_manager.cc	2023-03-29 15:28:14 +02:00
Takuya ASADA	497dd7380f	create-relocatable-package.py: stop using filter function on tools We introduced exclude_submodules at `19da4a5b8f` to exclude tools/java and tools/jmx since they have their own relocatable packages, so we don't want to package same files twice. However, most of the files under tools/ are not needed for installation, we just need tools/scyllatop. So what we really need to do is "ar.reloc_add('tools/scyllatop')", not excluding files from tools/. related with #13183 Closes #13215	2023-03-29 16:23:43 +03:00
Aleksandra Martyniuk	d7d570e39d	compaction: rename compaction::task To avoid confusion with task manager tasks compaction::task is renamed to compaction::compaction_task_exector. All inheriting classes are modified similarly.	2023-03-29 15:23:18 +02:00
Aleksandra Martyniuk	f24391fbe4	compaction: move compaction_manager::task out of compaction manager compaction_manager::task needs to be accessed from task manager compaction tasks. Thus, compaction_manager::task and all inheriting classes are moved from compaction manager to compaction namespace.	2023-03-29 15:21:24 +02:00
Wojciech Mitros	cfd2a4588d	wasm: move wasm initialization to query_processor constructor By moving the initialization to the constructor, we can now be certain that all wasm-related objects (wasm instance cache, compilation thread runner, and wasm engine, which was already passed in the constructor) are initialized when we try to use them because we have to use the query processor to access them anyway. The change is also motivated by the fact that we're planning to take Wasm UDFs out of experimental, after which they should stop getting special treatment.	2023-03-29 14:55:36 +02:00
Aleksandra Martyniuk	37cafec9d5	compaction: move sstable_task definition to source file	2023-03-29 14:53:43 +02:00
Botond Dénes	72772d5072	Merge 'auth: replace operator<<(..) with fmt formatter' from Kefu Chai this is a part of a series migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `authenticated_user` without using ostream<<. also, this change removes all existing callers of `operator<<(ostream, const authenticated_user&)`. Refs #13245 Closes #13359 * github.com:scylladb/scylladb: auth: drop operator<<(ostream, authenticated_user) cql3: do not use operator<< to print authenticated_user auth: specialize fmt::formatter<authenticated_user>	2023-03-29 15:24:07 +03:00
Kefu Chai	0b7c345bec	timeout_config: remove unused make_timeout_config() it is replaced by the ctor of updateable_timeout_config, so it does not have any callers now. let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 20:17:45 +08:00
Kefu Chai	98b9cbbc92	client_state: split the param list of ctor into multi lines it is 215-chars long, so let's breaks it into multiple lines for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 20:17:45 +08:00
Kefu Chai	ebf5e138e8	redis,thrift,transport: make timeout_config live-updateable * timeout_config - add `updated_timeout_config` which represents an always-updated options backed by `utils::updateable_value<>`. this class is used by servers which need to access the latest timeout related options. the existing `timeout_config` is more like a snapshot of the `updated_timeout_config`. it is used in the use case where we don't need to most updated options or we update the options manually on demand. * redis, thrift, transport: s/timeout_config/updated_timeout_config/ when appropriate. use the improved version of timeout_config where we need to have the access to the most-updated version of the timeout options. Fixes #10172 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 20:17:45 +08:00
Kefu Chai	11cea36c12	docs: dev: write mathematical expressions in LaTeX for better readability Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13341	2023-03-29 15:07:14 +03:00
Kefu Chai	f789d8d3cd	config: mark query timeouts live update-able in this change, following query timeouts config options are marked live update-able: - range_request_timeout_in_ms - read_request_timeout_in_ms - counter_write_request_timeout_in_ms - cas_contention_timeout_in_ms - truncate_request_timeout_in_ms - write_request_timeout_in_ms - request_timeout_in_ms as per https://github.com/scylladb/scylladb/issues/10172, > Many users would like to set the driver timers based on server timers. > For example: expire a read timeout before or after the server read time > out. with this change, these options are marked live-updateable, but since they are cached by their consumers locally, so we will have another commit to update the local copies when these options get updated. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 20:06:02 +08:00
Kefu Chai	1cc28679bc	transport: mark cql_server::timeout_config() const this function returns a const reference to member variable, so we can mark it with the `const` specifier for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 20:06:02 +08:00
Kefu Chai	ca83dc0101	auth: remove unused forward declaration `timeout_config` is not used by auth/common.hh. presumably, this class is not a public interface exposed by auth, as it is not inherently related auth. timeout_config is a shared setting across related services, specifically, redis_server, thrift and cql_server. so, in this change, let's drop this forward declaration. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 20:06:02 +08:00
Kefu Chai	9a159445f0	redis: drop unused member function now that `redis_server::connection::timeout_config()` and `redis_server::timeout_config()` are used nowhere, let's drop them. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 20:06:02 +08:00
Kefu Chai	d72ab78ffd	transport: drop unused member function since `cql_server::connection::timeout_config()` is used nowhere, let's just drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 20:06:02 +08:00
Kefu Chai	fec35b97ad	thrift: keep a reference of timeout_config in handler_factory this change should keep the timeout settings of handler_factory sync'ed with the ones used by `thrift_server`. so far, the `timeout_config` instance in `thrift_server` is not live-updateable, but in a follow-up change, we will make it so. so, this change prepares the handler_factory for a live-updateable timeout_config. instead keeping a snapshot of the timeout_config, keep a reference of it in handler_factory. the reference points to `thrift_server::_config`. so despite that `thrift_server::_handler_factory` is a shared_ptr, the member variable won't outlive its container, as the only reason to have it as a shared_ptr is to appease the ctor of `CassandraAsyncProcessorFactory`. and the constructed `_processor_factory` is also a member variable of `thrift_server`, so we won't take the risk of a dangling reference held by `handler_factory`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 20:06:02 +08:00
Kefu Chai	c642ca9e73	redis,thrift,transport: initialize _config with std::move(config) instead of copying the `config` parameter, move away from it. this change also prepares for a non-copyable config. if the class of `config` is not copyable, we will not be able to initialize the member variable by copying from the given `config` parameter. after the live-updateable config change, the `_config` member variable will contain instances of utils::observer<>, which is not copyable, but is move-constructable, hence in this change, we just move away from the give `config`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 20:06:02 +08:00
Kefu Chai	e0ac2eb770	redis,thrift,transport: pass config via sharded_parameter * pass config via sharded_parameter * initialize config using designated initializer this change paves the road to servers with live-updateable timeout options. before this change, the servers initialize a domain specific combo config, like `redis_server_config`, with the same instance of a timeout_config, and pass the combox config as a ctor parameter to construct each sharded service instance. but this design assumes the value semantic of the config class, say, it should be copyable. but if we want to use utils::updateable_value<> to get updated option values, we would have to postpone the instantiation of the config until the sharded service is about to be initialized. so, in this change, instead of taking a domain specific config created before hand, all services constructed with a `timeout_config` will take a `sharded_parameter()` for creating the config. also, take this opportunity to initialize the config using designated initializer. for two reasons: * less repeatings this way. we don't have to repeat the variable name of the config being initialized for each member variable. * prepare for some member variables which do not have a default constructor. this applies to the timeout_config's updater which will not have a default constructor, as it should be initialized by db::config and a reference to the timeout_config to be updated. we will update the `timeout_config` side in a follow-up commit. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 20:06:00 +08:00
Kefu Chai	99bf8bc0f4	bytes, gms: s/format_to/fmt::format_to/ to disambiguate `fmt::format_to()` from `std::format_to()`. turns out, we have `using namespace std` somewhere in the source tree, and with libstdc++ shipped by GCC-13, we have `std::format_to()`, so without exactly which one to use, compiler complains like ``` /optimized_clang/stage-1-X86/build/bin/clang++ -MD -MT build/dev/mutation/mutation.o -MF build/dev/mutation/mutation.o.d -I/optimized_clang/scylla-X86/seastar/include -I/optimized_clang/scylla-X86/build/dev/seastar/gen/include -U_FORTIFY_SOURCE -DSEASTAR_SSTRING -Werror=unused-result -fstack-clash-protection -DSEASTAR_API_LEVEL=6 -DSEASTAR_BUILD_SHARED_LIBS -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_TYPE_ERASE_MORE -DFMT_SHARED -I/usr/include/p11-kit-1 -ffile-prefix-map=/optimized_clang/scylla-X86=. -march=westmere -DDEVEL -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSCYLLA_ENABLE_ERROR_INJECTION -O2 -DSCYLLA_BUILD_MODE=dev -iquote. -iquote build/dev/gen --std=gnu++20 -ffile-prefix-map=/optimized_clang/scylla-X86=. -march=westmere -DBOOST_TEST_DYN_LINK -DNOMINMAX -DNOMINMAX -fvisibility=hidden -Wall -Werror -Wno-mismatched-tags -Wno-tautological-compare -Wno-parentheses-equality -Wno-c++11-narrowing -Wno-missing-braces -Wno-ignored-attributes -Wno-overloaded-virtual -Wno-unused-command-line-argument -Wno-unsupported-friend -Wno-delete-non-abstract-non-virtual-dtor -Wno-braced-scalar-init -Wno-implicit-int-float-conversion -Wno-delete-abstract-non-virtual-dtor -Wno-psabi -Wno-narrowing -Wno-nonnull -Wno-uninitialized -Wno-error=deprecated-declarations -DXXH_PRIVATE_API -DSEASTAR_TESTING_MAIN -DFMT_DEPRECATED_OSTREAM -c -o build/dev/mutation/mutation.o mutation/mutation.cc In file included from mutation/mutation.cc:9: In file included from mutation/mutation.hh:13: In file included from mutation/mutation_partition.hh:21: In file included from ./schema/schema_fwd.hh:13: In file included from ./utils/UUID.hh:22: ./bytes.hh:116:21: error: call to 'format_to' is ambiguous format_to(out, "{}{:02x}", _delimiter, std::byte(v[i])); ^~~~~~~~~ ./bytes.hh:134:43: note: in instantiation of function template specialization 'fmt::formatter<fmt_hex>::format<fmt::basic_format_context<fmt::appender, char>>' requested here return fmt::formatter<::fmt_hex>::format(::fmt_hex(bytes_view(s)), ctx); ^ /usr/include/fmt/core.h:813:64: note: in instantiation of function template specialization 'fmt::formatter<seastar::basic_sstring<signed char, unsigned int, 31, false>>::format<fmt::basic_format_context<fmt::appender, char>>' requested here -> decltype(typename Context::template formatter_type<T>().format( ^ /usr/include/fmt/core.h:824:10: note: while substituting deduced template arguments into function template 'has_const_formatter_impl' [with Context = fmt::basic_format_context<fmt::appender, char>, T = seastar::basic_sstring<signed char, unsigned int, 31, false>] return has_const_formatter_impl<Context>(static_cast<T*>(nullptr)); ``` to address this FTBFS, let's be more explicit by adding "fmt::" to specify which `format_to()` to use. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13361	2023-03-29 14:47:28 +03:00
Kefu Chai	ea2badb25f	utils: config_file: add a space after `=` for better readability Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 19:22:21 +08:00
Pavel Emelyanov	a95d3446fd	table: Carry v.u.generator down to do_push_view_replica_updates() The latter is the place where mutate_MV is called and it needs the view updates generator nearby. The call-stack starts at database::do_apply(). As was described in one of the previous patches, applying mutations that need updating views happen late enough, so if the view updates generator is not plugged to the database yet, it's OK to bail out with exception. If it's plugged, it's carried over thus keeping the generator instance alive and waited for on its stop. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-29 14:12:01 +03:00
Pavel Emelyanov	ddc8c8b019	view: Keep v.u.generator shared pointer on view_builder::consumer This is another mutations consumer that pushes view updates forward and thus also needs the view updates generator pointer. It gets one from the view builder that already has the dependency on generator. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-29 14:11:30 +03:00
Pavel Emelyanov	2652dffd89	view: Capture v.u.generator on view_updating_consumer lambda The consumer is in fact pushing the updates and _that_'s the component that would really need the view_update_generator at hand. The consumer is created from the generator itself so no troubles getting the pointer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-29 14:10:55 +03:00
Pavel Emelyanov	d5557ef0e2	view: Plug view update generator to database The database is low-level service and currently view update generator implicitly depend on it via storage proxy. However, database does need to push view updates with the help of mutate_MV helper, thus adding the dependency loop. This patch exploits the fact that view updates start being pushed late enough, by that time all other service, including proxy and view update generator, seem to be up and running. This allows a "weak dependency" from database to view update generator, like there's one from database to system keyspace already. So in this patch the v.u.g. puts the shared-from-this pointer onto the database at the time it starts. On stop it removes this pointer after database is drained and (hopefully) all view updates are pushed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-29 14:09:49 +03:00
Pavel Emelyanov	3455b1aed8	view: Add view_builder -> view_update_generator dependency The builder will need generator for view_builder::consumer in one of the next patches. The builder is a standalone service that starts one of the latest and no other services need builder as their dependency. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-29 14:08:47 +03:00
Pavel Emelyanov	3fd12d6a0e	view: Add view_update_generator -> sharded<storage_proxy> dependency The generator will be responsible for spreading view updates with the help of mutate_MV helper. The latter needs storage proxy to operate, so the generator gets this dependency in advance. There's no need to change start/stop order at the moment, generator already starts after and stops before proxy. Also, services that have generator as dependency are not required by proxy (even indirectly) so no circular dependency is produced at this point. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-29 14:08:47 +03:00
Kefu Chai	c307c60d04	scripts: correct a typo in comment s/refreh/refresh/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13357	2023-03-29 13:44:47 +03:00
Kefu Chai	55a8b50bbd	release: correct a typo in comment s/to levels of indirection/two levels of indirection/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13358	2023-03-29 13:42:38 +03:00
Kefu Chai	dfb55975fc	Update tools/jmx submodule this helps to use OpenJDK 11 instead of OpenJDK 8 for running scylla-jmx, in hope to alleviate the pain of the crashes found in the JRE shipped along with OpenJDK 8, as it is aged, and only security fixes are included now. * tools/jmx 88d9bdc...48e1699 (3): > Merge 'dist/redhat: support jre 11 instead of jre 8' from Kefu Chai > install.sh: point java to /usr/bin/java > Merge 'use OpenJDK 11 instead of OpenJDK 8' from Kefu Chai Refs https://github.com/scylladb/scylla-jmx/issues/194 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13356	2023-03-29 13:00:40 +03:00
Kefu Chai	57f51603dc	compound_compat: remove operator<<(ostream, composite) since we don't have any callers of this operator, let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 16:13:59 +08:00
Kefu Chai	212641abda	compound_compat: remove operator<<(ostream, composite_view) since we don't have any callers of this operator, let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 16:13:59 +08:00
Kefu Chai	cdb972222e	sstables: do not use operator<< to print composite_view this change removes the last two callers of `operator<<(ostream&, const composite_view&)`, it paves the road to remove this operator. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 16:13:59 +08:00
Kefu Chai	1ef8f63b4e	compound_compat.hh: specialize fmt::formatter<composite> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `composite` with the help of fmt::ostream. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 16:13:59 +08:00
Kefu Chai	28cabd0a1f	compound_compat.hh: specialize fmt::formatter<composite_view> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `composite::composite_view` with the help of fmt::ostream. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 16:13:59 +08:00
Kefu Chai	15eac8c4cd	compound_compat.hh: specialize fmt::formatter<component_view> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `composite::component_view` with the help of fmt::ostream. in this change, '#' is used to add 0x prefix. as fmtlib allows us to add '0x' prefix using '#' format specifier when printing numbers using 'x' as its type specifier. see https://fmt.dev/latest/syntax.html Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 16:13:10 +08:00
Kefu Chai	5a9b4c02e3	auth: drop operator<<(ostream, authenticated_user) since we don't have any callers of this operator, let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 16:02:29 +08:00
Kefu Chai	85c89debe6	cql3: do not use operator<< to print authenticated_user this change removes the last two callers of `operator<<(ostream&, const authenticated_user&)`, it paves the road to remove this operator. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 16:02:29 +08:00
Kefu Chai	a7037ae0f4	auth: specialize fmt::formatter<authenticated_user> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `auth::authenticated_user` with the help of fmt::ostream. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 16:02:29 +08:00
David Garcia	f45c4983db	docs: update theme 1.4 Closes #13346	2023-03-29 06:56:27 +03:00
Avi Kivity	6977df5539	cql3/selection, forward_service: use use stateless_aggregate_function directly Now that stateless_aggregate_function is directly exposed by aggregate_function, we can use it directly, avoiding the intermediary aggregate_function::aggregate, which is removed.	2023-03-28 23:49:34 +03:00
Avi Kivity	58eb21aa5d	db: functions: fold stateless_aggregate_function_adapter into aggregate_function Now that all aggregate functions are derived from stateless_aggregate_function_adapter, we can just fold its functionality into the base class. This exposes stateless_aggregate_function to all users of aggregate_function, so they can begin to benefit from the transformation, though this patch doesn't touch those users. The aggregate_function base class is partiallly devirtualized since there is just a single implementation now.	2023-03-28 23:47:11 +03:00
Avi Kivity	68529896aa	cql3: functions: simplify accumulator_for template The accumulator_for template is used to select the accumulator type for aggregates. After refactoring, all that is needed from it is to select the native type, so remove all the excess code.	2023-03-28 23:47:11 +03:00
Avi Kivity	4ea3136026	cql3: functions: base user-defined aggregates on stateless aggregates Since the model for stateless aggregates was taken from user defined aggregates, the conversion is trivial.	2023-03-28 23:47:11 +03:00
Avi Kivity	f2715b289a	cql3: functions: drop native_aggregate_function Now that all aggregates are implemented staetelessly, native_aggregate_function no longer has subclasses, so drop it.	2023-03-28 23:47:11 +03:00
Avi Kivity	6bceb25982	cql3: functions: reimplement count(column) statelessly Note that we don't use the automarshalling helper for the aggregation function, since it doesn't work for compound types.	2023-03-28 23:47:11 +03:00
Avi Kivity	4f2cdace9a	cql3: functions: reimplement avg() statelessly	2023-03-28 23:47:11 +03:00
Avi Kivity	b0a8fd3287	cql3: functions: reimplement sum() statelessly	2023-03-28 23:47:11 +03:00
Avi Kivity	d21d11466a	cql3: functions: change wide accumulator type to varint Currently, we use __int128, but this has no direct counterpart in CQL, so we can't express the accumulator type as part of a CQL scalar function. Switch to varint which is a superset, although slower.	2023-03-28 23:47:11 +03:00
Avi Kivity	3252dc0172	cql3: functions: unreverse types for min/max Currently it works without this, but later unreversing will be removed from another part of the stack, causing min/max on reversed types to return incorrect results. Anticipate that an unreverse the types during construction.	2023-03-28 23:47:09 +03:00
Avi Kivity	ed466b7e68	cql3: functions: rename make_{min,max}_dynamic_function There's no longer a statically-typed variant, so no need to distinguish the dynamically-typed one.	2023-03-28 23:37:49 +03:00
Wojciech Mitros	c9b701b516	wasm: return wasm instance cache as a reference instead of a pointer In an incoming change, the wasm instance cache will be modified to be owned by the query_processor - it will hold an optional instead of a raw pointer to the cache, so we should stop returning the raw pointer from the getter as well. Consequently, the cache is also stored as a reference in wasm::cache, as it gets the reference from the query_processor. For consistency with the wasm engine and the wasm alien thread runner, the name of the getter is also modified to follow the same pattern.	2023-03-28 18:18:48 +02:00
Wojciech Mitros	60c99b4c47	wasm: move wasm engine to query_processor The wasm engine is used for compiling and executing Wasm UDFs, so the query_processor is a more appropriate location for it than replica::database, especially because the wasm instance cache and the wasm alien thread runner are already there. This patch also reduces the number of wasm engines to 1, shared by all shards, as recommended by the wasmtime developers.	2023-03-28 17:41:30 +02:00
Calle Wilund	6525209983	alternator/rest api tests: Remove name assumption and rely on actual scylla info Fixes #13332 The tests user the discriminator "system" as prefix to assume keyspaces are marked "internal" inside scylla. This is not true in enterprise universe (replicated key provider). It maybe/probably should, but that train is sailing right now. Fix by removing one assert (not correct) and use actual API info in the alternator test. Closes #13333	2023-03-28 15:41:23 +03:00
Raphael S. Carvalho	989afbf83b	compaction: TWCS: wire up compaction_strategy_state TWCS no longer keeps internal state, and will now rely on state managed by each compaction group through compaction::table_state. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-28 08:48:15 -03:00
Raphael S. Carvalho	233fe6d3dc	compaction: LCS: wire up compaction_strategy_state LCS no longer keeps internal state, and will now rely on state managed by each compaction group through compaction::table_state. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-28 08:48:15 -03:00
Raphael S. Carvalho	2186a75e9b	compaction: Expose compaction_strategy_state through table_state That will allow compaction_strategy to access the compaction group state through compaction::table_state, which is the interface at which replica talks to the compaction layer. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-28 08:48:10 -03:00
Botond Dénes	b6c022a142	Merge 'cmake: sync with `configure.py` (15/n)' from Kefu Chai this is the 15th changeset of a series which tries to give an overhaul to the CMake building system. this series has two goals: - to enable developer to use CMake for building scylla. so they can use tools (CLion for instance) with CMake integration for better developer experience - to enable us to tweak the dependencies in a simpler way. a well-defined cross module / subsystem dependency is a prerequisite for building this project with the C++20 modules. this changeset includes following changes: - build: cmake: add two missing tests - build: cmake: port more cxxflags from configure.py Closes #13262 * github.com:scylladb/scylladb: build: cmake: add missing source files to idl and service build: cmake: port more cxxflags from configure.py build: cmake: add two missing tests	2023-03-28 09:16:38 +03:00
Botond Dénes	88c5b2618c	Merge 'Get rid of global variable "load_prio_keyspaces" (step 1)' from Calle Wilund The concept is needed by enterprise functionality, but in the hunt for globals this sticks out and should be removed. This is also partially prompted by the need to handle the keyspaces in the above set special on shutdown as well as startup. I.e. we need to ensure all user keyspaces are flushed/closed earlier then these. I.e. treat as "system" keyspace for this purpose. These changes adds a "extension internal" keyspace set instead, which for now (until enterprise branches are updated) also included the "load_prio" set. However, it changes distributed loader to use the extension API interface instead, as well as adds shutdown special treatment to replica::database. Closes #13335 * github.com:scylladb/scylladb: datasbase: Flush/close "extension internal" keyspaces after other user ks distributed_loader: Use extensions set of "extension internal" keyspaces db::extentions: Add "extensions internal" keyspace set	2023-03-28 08:35:10 +03:00
Kefu Chai	fcee7f7ac9	reloc: silence warning from readelf we've been seeing errors like ``` 10:39:36 gdb-add-index: [Was there no debuginfo? Was there already an index?] 10:39:36 readelf: /jenkins/workspace/scylla-master/next/scylla/build/dist/debug/redhat/BUILDROOT/scylla-5.3.0~dev-0.20230321.0f97d464d32b.x86_64/usr/lib/debug/opt/scylladb/libreloc/libc.so.6-5.3.0~dev-0.20230321.0f97d464d32b.x86_64.debug: Error: Unable to find program interpreter name ``` when strip.sh is processing *.debug elf images. this is caused by a known issue, see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1012107 . and this error is not fatal. but it is very distracting when we are trying to find errors in jenkins logging messages. so, in this change, the stderr output from readelf is muted for higher signal-noise ratio in the build logging message. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13267	2023-03-28 08:29:37 +03:00
Anna Stuchlik	4435b8b6f1	doc: elaborate on Scylla admin REST API - V2 This is V2 of https://github.com/scylladb/scylladb/pull/11849 This commit addes more information about ScyllaDB's REST API, including and example for Docker and a screenshot of the Swagger UI. Co-authored-by: Tzach Livyatan <tzach.livyatan@gmail.com> Closes #13331	2023-03-28 08:27:09 +03:00
Botond Dénes	9a024f72c4	Merge 'thrift: return address in listen_addresses() only after server is ready' from Marcin Maliszkiewicz This is used for readiness API: /storage_service/rpc_server and the fix prevents from returning 'true' prematurely. Some improvement for readiness was added in `a51529dd15` but thrift implementation wasn't fully done. Fixes https://github.com/scylladb/scylladb/issues/12376 Closes #13319 * github.com:scylladb/scylladb: thrift: return address in listen_addresses() only after server is ready thrift: simplify do_start_server() with seastar:async	2023-03-28 08:26:16 +03:00
Botond Dénes	60240e6d91	Merge 'bytes, gms: replace operator<<(..) with fmt formatter' from Kefu Chai this is a part of a series migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `bytes` and `gms::inet_address` without using ostream<<. also, this change removes all existing callers of `operator<<(ostream, const bytes &)` and `operator<<(ostream, const gms::inet_address&)`. `gms::inet_address` related changes are included here in hope to demonstrate the usage of delimiter specifier of `fmt_hex` 's formatter. Refs #13245 Closes #13275 * github.com:scylladb/scylladb: gms/inet_address: implement operator<< using fmt::formatter treewide: use fmtlib to format gms::inet_address gms/inet_address: specialize fmt::formatter<gms::inet_address> bytes: implement formatting helpers using formatter bytes: specialize fmt::formatter<bytes> bytes: specialize fmt::formatter<fmt_hex> bytes: mark fmt_hex::v `const`	2023-03-28 08:25:41 +03:00
Botond Dénes	b22f8c6d13	Merge 'Adjust repair module to other task manager modules' conventions' from Aleksandra Martyniuk Files with task manager repair module and related classes are modified to be consistent with task manager compaction module. Closes #13231 * github.com:scylladb/scylladb: repair: rename repair_module repair: add repair namespace to repair/task_manager_module.hh repair: rename repair_task.hh	2023-03-28 08:24:42 +03:00
Raphael S. Carvalho	ee89ff24f2	replica: Add compaction_strategy_state to compaction group The state is not wired anywhere yet. It will replice the ones stored in compaction strategies themselves. Therefore, allowing each compaction group to have its own state. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-27 15:46:14 -03:00
Raphael S. Carvalho	25f73a4181	compaction: Introduce compaction_strategy_state Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-27 15:46:11 -03:00
Raphael S. Carvalho	1ffe2f04ef	compaction: add table_state param to compaction_strategy::notify_completion() once compaction_strategy is made staless, the state must be retrieved in notify_completion() through table_state. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-27 13:40:02 -03:00
Raphael S. Carvalho	2ffaae97a4	compaction: LCS: extract state into a separate struct Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-27 13:40:02 -03:00
Raphael S. Carvalho	e2f38baa92	compaction: TWCS: prepare for stateless strategy Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-27 13:40:01 -03:00
Raphael S. Carvalho	017f432b8f	compaction: TWCS: extract state into a separate struct This is a step towards decoupling compaction strategy (computation) and its state. Making the former stateless. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-27 13:38:47 -03:00
Calle Wilund	7af7c379a5	datasbase: Flush/close "extension internal" keyspaces after other user ks Refs #13334 Effectively treats keyspaces listed in "extension internal" as system keyspaces w.r.t. shutdown/drain. This ensures all user keyspaces are fully flushed before we disable these "internal" ones.	2023-03-27 15:15:49 +00:00
Calle Wilund	c3ec6a76c0	distributed_loader: Use extensions set of "extension internal" keyspaces Refs #13334 Working towards removing load_prio_keyspaces. Use the extensions interface to determine which keyspaces to initialize early.	2023-03-27 15:14:13 +00:00
Calle Wilund	7c8c020c0e	db::extentions: Add "extensions internal" keyspace set Refs #13334 To be populated early by extensions. Such a keyspace should be 1.) Started before user keyspaces 2.) Flushed/closed after user keyspaces 3.) For all other regards be considered "user".	2023-03-27 15:12:31 +00:00
Aleksandra Martyniuk	f10b862955	repair: rename repair_module	2023-03-27 16:33:39 +02:00
Aleksandra Martyniuk	8f935481cd	repair: add repair namespace to repair/task_manager_module.hh	2023-03-27 16:32:51 +02:00
Aleksandra Martyniuk	17e0e05f42	repair: rename repair_task.hh	2023-03-27 16:31:51 +02:00
Raphael S. Carvalho	232e71f2cf	compaction: add const-qualifier to a few compaction_strategy methods Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-27 11:13:10 -03:00
Botond Dénes	c7131a0574	Update tools/cqlsh/ submodule * tools/cqlsh b9a606f...8769c4c (11): > dist: redhat: provide only a single version > pylib/setup, requirement.txt: remove Six > setup: do not support python2 > install.sh: install files with correct permission in struct umask settings > Remove unneed LC_ALL=en_US.UTF-8 > Support using other driver (datastax or older scylla ones) > Fix RPM based downgrade command on scylla-cqlsh > gitignore: ignore pylib/cqlshlib/__pycache__ > dist/redhat: add a proper changelog entry > github actions: enable starting on tags > Add support for building docker image	2023-03-27 16:23:54 +03:00
Kefu Chai	a3cb5db542	gms/inet_address: implement operator<< using fmt::formatter less repeatings this way, Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-27 20:06:45 +08:00
Kefu Chai	8dbaef676d	treewide: use fmtlib to format gms::inet_address the goal of this change is to reduce the dependency on `operator<<(ostream&, const gms::inet_address&)`. this is not an exhaustive search-and-replace change, as in some caller sites we have other dependencies to yet-converted ostream printer, we cannot fix them all, this change only updates some caller of `operator<<(ostream&, const gms::inet_address&)`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-27 20:06:45 +08:00
Kefu Chai	4ea6e06cac	gms/inet_address: specialize fmt::formatter<gms::inet_address> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `gms::inet_address` with the help of fmt::ostream. please note, the ':' delimiter is specified when printing the IPv6 address. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-27 20:06:45 +08:00
Kefu Chai	a606606ac4	bytes: implement formatting helpers using formatter some of these helpers prints a byte array using `to_hex()`, which materializes a string instance and then drop it on the floor after printing it to the given ostream. this hurts the performance, so `fmt::print()` should be more performant in comparison to the implementations based on `to_hex()`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-27 20:06:45 +08:00
Kefu Chai	36dc2e3f28	bytes: specialize fmt::formatter<bytes> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print bytes with the help of fmt::ostream. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-27 20:06:45 +08:00
Kefu Chai	2f9dfba800	bytes: specialize fmt::formatter<fmt_hex> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print bytes_view with the help of fmt::ostream. because fmtlib has its own specialization for fmt::formatter<std::basic_string_view<T>>, we cannot just create a full specialization for std::basic_string_view<int8_t>, otherwise fmtlib would complain that > Mixing character types is disallowed. so we workaround this using a delegate of fmt_hex. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-27 20:06:45 +08:00
Tomasz Grabiec	79ee38181c	Merge 'storage_service: wait for normal state handlers earlier in the boot procedure' from Kamil Braun The `wait_for_normal_state_handled_on_boot` function waits until `handle_state_normal` finishes for the given set of nodes. It was used in `run_bootstrap_ops` and `run_replace_ops` to wait until NORMAL states of existing nodes in the cluster are processed by the joining node before continuing the joining process. One reason to do it is because at the end of `handle_state_normal` the joining node might drop connections to the NORMAL nodes in order to reestablish new connections using correct encryption settings. In tests we observed that the connection drop was happening in the middle of repair/streaming, causing repair/streaming to abort. Unfortunately, calling `wait_for_normal_state_handled_on_boot` in `run_bootstrap_ops`/`run_replace_ops` is too late to fix all problems. Before either of these two functions, we create a new CDC generation and write the data to `system_distributed_everywhere.cdc_generation_descriptions_v2`. In tests, the connections were sometimes dropped while this write was in-flight. This would cause the write to never arrive to other nodes, and the joining node would timeout waiting for confirmations. To fix this, call `wait_for_normal_state_handled_on_boot` earlier in the boot procedure, before `make_new_generation` call which does the write. Fixes: #13302 Closes #13317 * github.com:scylladb/scylladb: storage_service: wait for normal state handlers earlier in the boot procedure storage_service: bootstrap: wait for normal tokens to arrive in all cases storage_service: extract get_nodes_to_sync_with helper storage_service: return unordered_set from get_ignore_dead_nodes_for_replace	2023-03-27 13:56:47 +02:00
Kamil Braun	cd282cf0ab	Merge 'Raft, use schema commit log' from Gusev Petr We need this so that we can have multi-partition mutations which are applied atomically. If they live on different shards, we can't guarantee atomic write to the commitlog. Fixes: #12642 Closes #13134 * github.com:scylladb/scylladb: test_raft_upgrade: add a test for schema commit log feature scylla_cluster.py: add start flag to server_add ServerInfo: drop host_id scylla_cluster.py: add config to server_add scylla_cluster.py: add expected_error to server_start scylla_cluster.py: ScyllaServer.start, refactor error reporting scylla_cluster.py: fix ScyllaServer.start, reset cmd if start failed raft: check if schema commitlog is initialized Refuse to boot if neither the schema commitlog feature nor force_schema_commit_log is set. For the upgrade procedure the user should wait until the schema commitlog feature is enabled before enabling consistent_cluster_management. raft: move raft initialization after init_system_keyspace database: rename before_schema_keyspace_init->maybe_init_schema_commitlog raft: use schema commitlog for raft tables init_system_keyspace: refactoring towards explicit load phases	2023-03-27 13:27:30 +02:00
Marcin Maliszkiewicz	339a8fe64d	thrift: return address in listen_addresses() only after server is ready listen_addresses() checks if _server variable is empty and after this patch we assign (move) the value only after server is ready. This is used for readiness API: /storage_service/rpc_server and the fix prevents from returning 'true' prematurely. Some improvement for readiness was added in `a51529dd15` but thrift implementation wasn't fully done. Fixes #12376	2023-03-27 13:20:53 +02:00
Marcin Maliszkiewicz	a38701b9d4	thrift: simplify do_start_server() with seastar:async Code is executed typically on startup only so overhead is very limited. Notably using async avoids managing tserver variable lifetime.	2023-03-27 13:12:10 +02:00
David Garcia	70ce1b2002	docs: Separate conf.py docs: update github actions docs: fix Makefile tabs Update docs-pr.yaml Update Makefile Closes #13323	2023-03-27 13:42:58 +03:00
Botond Dénes	89e58963ab	Update tools/python3/ submodule * tools/python3 279b6c1...d2f57dd (3): > dist: redhat: provide only a single version > SCYLLA-VERSION-GEN: use -gt when comparing values > SCYLLA-VERSION-GEN: remove unnecessary bashism	2023-03-27 12:00:27 +03:00
Botond Dénes	b5afdf56c3	Merge 'Cleanup keyspace compaction task' from Aleksandra Martyniuk Task manager task implementations of classes that cover cleanup keyspace compaction which can be started through /storage_service/keyspace_compaction/ api. Top level task covers the whole compaction and creates child tasks on each shard. Closes #12712 * github.com:scylladb/scylladb: test: extend test_compaction_task.py to test cleanup compaction compaction: create task manager's task for cleanup keyspace compaction on one shard compaction: create task manager's task for cleanup keyspace compaction api: add get_table_ids to get table ids from table infos compaction: create cleanup_compaction_task_impl	2023-03-27 11:52:51 +03:00
Kefu Chai	ed347c5051	bytes: mark fmt_hex::v `const` as fmt_hex is a helper class for formatting the underlying `bytes_view`, it does not mutate it, so mark the member variable const and mark the parameter in its constructor const. this change also helps us to use fmt_hex in the use case where the const semantics is expected. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-27 16:49:07 +08:00
Botond Dénes	ab61704c54	Merge 'mutation: replace operator<<(.., const range_tombstone&) with fmt formatter' from Kefu Chai this is a part of a series migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `range_tombstone` and `range_tombstone_change` without using ostream<<. also, this change removes all existing callers of `operator<<(ostream, const range_tombstone &)` and `operator<<(ostream, const range_tombstone_change &)`, and then removes these two `operator<<`s. Refs #13245 Closes #13260 * github.com:scylladb/scylladb: mutation: drop operator<<(ostream, const range_tombstone{_change,} &) mutation: use fmtlib to print range_stombstone{_change,} mutation: mutation_fragment_v2: specialize fmt::formatter<range_tombstone_change> mutation: range_tombstone: specialize fmt::formatter<range_tombstone>	2023-03-27 11:38:59 +03:00
Botond Dénes	bd42f5ee0b	Merge 'raft: includes used header and use <path/to/header> for include boost headers' from Kefu Chai at least, we need to access the declarations of exceptions, like`not_a_leader` and `dropped_entry`, so, instead of relying on other header to do this job for us, we should include the header which include the declaration. so, in this chance "raft.h" is include explicitly. also, include boost headers using "<path/to/header>` instead of "path/to/header` for more consistency. Closes #13326 * github.com:scylladb/scylladb: raft: include boost header using <path/to/header> not "path/to/header" raft: include used header	2023-03-27 10:11:45 +03:00
Kefu Chai	96ba88f621	dist/debian: add libexec/scylla to source/include-binaries * scripts/create-relocatable-package.py: add a command to print out executables under libexec * dist/debian/debian_files_gen.py: call create-relocatable-package.py for a list of files under libexec and create source/include-binaries with the list. we repackage the precompiled binaries in the relocatable package into a debian source package using `./scylla/install.sh`, which edits the executable to use the specified dynamic library loader. but dpkg-source does not like this, as it wants to ensure that the files in original tarball (*.orig.tar.gz) is identical to the files in the source package created by dpkg-source. so we have following failure when running reloc/build_deb.sh ``` dpkg-source: error: cannot represent change to scylla/libexec/scylla: binary file contents changed dpkg-source: error: add scylla/libexec/scylla in debian/source/include-binaries if you want to store the modified binary in the debian tarball dpkg-source: error: unrepresentable changes to source dpkg-buildpackage: error: dpkg-source -b . subprocess returned exit status 1 debuild: fatal error at line 1182: dpkg-buildpackage -rfakeroot -us -uc -ui failed ``` in this change, to address the build failure, as proposed by dpkg, the path to the patched/edited executable is added to `debian/source/include-binaries`. see the "Building" section in https://manpages.debian.org/bullseye/dpkg-dev/dpkg-source.1.en.html for more details. please search `adjust_bin()` in `scylladb/install.sh` for more details. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12722	2023-03-27 10:10:12 +03:00
Botond Dénes	4b5b6a9010	test/lib: rm test_table.hh No users left.	2023-03-27 02:00:44 -04:00
Botond Dénes	3a43574b39	test/boos/multishard_mutation_query_test: migrate other tests to random schema Create a local method called create_test_table that has the same signature as test::create_test_table, but uses random schema behind the scenes to generate the schema and the data, then migrate all the test cases to use it instead. To accomodate to the added randomness added by the random schema and random data, the unreliable querier cache population checks was replaced with more reliable lookup and miss checks, to prevent test flakiness. Querier cache population checks worked well with a fixed and simple schema and a fixed table population, they don't work that well with random data. With this, there are no more uses of test_table.hh in this test and the include can be removed.	2023-03-27 02:00:44 -04:00
Botond Dénes	56a9968817	test/boost/multishard_mutation_query_test: use ks keyspace This keyspace exists by default and thus we don't have to create a new one for each test. Also use `get_name()` to pass the test case's name as table name, instead of hard-coding it. We already had some copy-pasta creep in: two tests used the same table name. This is an error, as each test runs in its own env, but it is confusing to see another test case's name in the logs.	2023-03-27 02:00:44 -04:00
Botond Dénes	ad313d8eef	test/boost/multishard_mutation_query_test: improve test pager Propagate the page size to the result builder, so it can determine when a page is short and thus it is the last page, instead of asking for more pages until an empty one turns up. This will make tests more reliable when dealing with random datasets. Also change how the page counter is bumped: bump it after the current page is executed, at which point we know whether there will be a next page or not. This fixes an off-by-one seen in some cases.	2023-03-27 02:00:44 -04:00
Botond Dénes	3df70a9f3b	test/boost/multishard_mutation_query_test: refactor fuzzy_test Use the random_schema and its facilities to generate the schema and the dataset. This allows the test to provide a much better coverage then the previous, fixed and simplistic schema did. Also reduce the test table population and the number of scans ran on it to the test runs in a more reasonable time-frame. We run these tests all the time due to CI, so no need to try to do too much in a single run.	2023-03-27 02:00:43 -04:00
Botond Dénes	2cdda562f7	test/boost: add multishard_mutation_query_test more memory The tests in this file work with random schema and random data. Some seeds can generate large partitions and rows, give the test some more headroom to work with.	2023-03-27 01:44:00 -04:00
Botond Dénes	00f06522c2	types/user: add get_name() accessor For the raw name (bytes).	2023-03-27 01:44:00 -04:00
Botond Dénes	99c9a71d93	test/lib/random_schema: add create_with_cql() Allowing the generated schema to be created as a CQL table, so that queries can be run against it.	2023-03-27 01:44:00 -04:00
Botond Dénes	10a44fee06	test/lib/random_schema: fix udt handling * generate lowercase names (upper-case seems to cause problems); * preserve dependency order between UDTs when dumping them from schema; * use built-in describe() to dump to CQL string; * drop single arg dump_udts() overlad, which was not recursive, unlike the vector variant;	2023-03-27 01:44:00 -04:00
Botond Dénes	b2ddc60c10	test/lib/random_schema: type_generator(): also generate frozen types For regular and static columns, to introduce some further randomness. So far frozen types were generated only for primary key members and embedded types.	2023-03-27 01:44:00 -04:00
Botond Dénes	1cb4b1fc83	test/lib/random_schema: type_generator(): make static column generation conditional On the schema having clustering columns. Otherwise static column is illegal.	2023-03-27 01:44:00 -04:00
Botond Dénes	2a7cccd1a8	test/lib/random_schema: type_generator(): don't generate duration_type for keys And for any embedded type (collection, tuple members, etc.). Its not allowed as I recently learned it.	2023-03-27 01:44:00 -04:00
Botond Dénes	c9f54e539d	test/lib/random_schema: generate_random_mutations(): add overload with seed	2023-03-27 01:44:00 -04:00
Botond Dénes	394909869d	test/lib/random_schema: generate_random_mutations(): respect range tombstone count param Even though there is a parameter determining the number of range tombstones to be generated, the method disregards it and generates just 4. Fix that.	2023-03-27 01:43:59 -04:00
Botond Dénes	477b26f7af	test/lib/random_schema: generate_random_mutations(): add yields	2023-03-27 01:43:59 -04:00
Botond Dénes	fd8a50035a	test/lib/random_schema: generate_random_mutations(): fix indentation	2023-03-27 01:43:59 -04:00
Botond Dénes	71fdec7b42	test/lib/random_schema: generate_random_mutations(): coroutinize method	2023-03-27 01:43:59 -04:00
Botond Dénes	393aaddff0	test/lib/random_schema: generate_random_mutations(): expand comment Add note about mutation order and deduplication.	2023-03-27 01:43:59 -04:00
Avi Kivity	cd0b167d6c	Merge 'bloom_filter: cleanups' from Kefu Chai this series applies some random cleanups to bloom_filter. these cleanups were the side products when the author was working on #13314 . Closes #13315 * github.com:scylladb/scylladb: bloom_filter: mark internal help function static bloom_filter: add more constness to false positive rate tables bloom_filter: use vector::back() when appropriate	2023-03-26 19:43:37 +03:00
Kefu Chai	33f4012eeb	test: cql-pytest: test_describe: clamp bloom filter's fp rate before this change, we use `round(random.random(), 5)` for the value of `bloom_filter_fp_chance` config option. there are chances that this expression could return a number lower or equal to 6.71e-05. but we do have a minimal for this option, which is defined by `utils::bloom_calculations::probs`. and the minimal false positive rate is 6.71e-05. we are observing test failures where the we are using 0 for the option, and scylla right rejected it with the error message of ``` bloom_filter_fp_chance must be larger than 6.71e-05 and less than or equal to 1.0 (got 0) ```. so, in this change, to address the test failure, we always use a number slightly greater or equal to a number slightly greater to the minimum to ensure that the randomly picked number is in the range of supported false positive rate. Fixes #13313 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13314	2023-03-26 19:41:22 +03:00
Botond Dénes	d5488dba69	reader_permit: set_trace_state(): emit trace message linking to previous page This method is called on the start of each page, updating the trace state stored on the permit to that of the current page. When doing so, emit a trace message, containing the session id of the previous page, so the per-page sessions can be stiched together later. Note that this message is only emitted if the cached read survived between the pages. Example: Tracing session: dcfc1570-ca3c-11ed-88d0-24443f03a8bb activity \| timestamp \| source \| source_elapsed \| client ---------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2023-03-24 08:10:27.271000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2023-03-24 08:10:27.271864 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2023-03-24 08:10:27.271958 \| 127.0.0.1 \| 94 \| 127.0.0.1 Creating read executor for token 3274692326281147944 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 0] \| 2023-03-24 08:10:27.271995 \| 127.0.0.1 \| 132 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2023-03-24 08:10:27.271998 \| 127.0.0.1 \| 135 \| 127.0.0.1 Start querying singular range {{3274692326281147944, pk{00026b73}}} [shard 0] \| 2023-03-24 08:10:27.272003 \| 127.0.0.1 \| 140 \| 127.0.0.1 [reader concurrency semaphore] admitted immediately [shard 0] \| 2023-03-24 08:10:27.272006 \| 127.0.0.1 \| 143 \| 127.0.0.1 [reader concurrency semaphore] executing read [shard 0] \| 2023-03-24 08:10:27.272014 \| 127.0.0.1 \| 150 \| 127.0.0.1 Querying cache for range {{3274692326281147944, pk{00026b73}}} and slice {(-inf, +inf)} [shard 0] \| 2023-03-24 08:10:27.272022 \| 127.0.0.1 \| 159 \| 127.0.0.1 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 3 clustering row(s) (3 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2023-03-24 08:10:27.272076 \| 127.0.0.1 \| 212 \| 127.0.0.1 Caching querier with key ab928e0d-b815-46b7-9a02-1fa2d9549477 [shard 0] \| 2023-03-24 08:10:27.272084 \| 127.0.0.1 \| 221 \| 127.0.0.1 Querying is done [shard 0] \| 2023-03-24 08:10:27.272087 \| 127.0.0.1 \| 224 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2023-03-24 08:10:27.272106 \| 127.0.0.1 \| 242 \| 127.0.0.1 Request complete \| 2023-03-24 08:10:27.271259 \| 127.0.0.1 \| 259 \| 127.0.0.1 Tracing session: dd3092f0-ca3c-11ed-88d0-24443f03a8bb activity \| timestamp \| source \| source_elapsed \| client ---------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2023-03-24 08:10:27.615000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2023-03-24 08:10:27.615223 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2023-03-24 08:10:27.615310 \| 127.0.0.1 \| 87 \| 127.0.0.1 Creating read executor for token 3274692326281147944 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 0] \| 2023-03-24 08:10:27.615346 \| 127.0.0.1 \| 124 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2023-03-24 08:10:27.615349 \| 127.0.0.1 \| 126 \| 127.0.0.1 Start querying singular range {{3274692326281147944, pk{00026b73}}} [shard 0] \| 2023-03-24 08:10:27.615352 \| 127.0.0.1 \| 130 \| 127.0.0.1 Found cached querier for key ab928e0d-b815-46b7-9a02-1fa2d9549477 and range(s) {{{3274692326281147944, pk{00026b73}}}} [shard 0] \| 2023-03-24 08:10:27.615358 \| 127.0.0.1 \| 135 \| 127.0.0.1 Reusing querier [shard 0] \| 2023-03-24 08:10:27.615362 \| 127.0.0.1 \| 139 \| 127.0.0.1 Continuing paged query, previous page's trace session is dcfc1570-ca3c-11ed-88d0-24443f03a8bb [shard 0] \| 2023-03-24 08:10:27.615364 \| 127.0.0.1 \| 141 \| 127.0.0.1 [reader concurrency semaphore] executing read [shard 0] \| 2023-03-24 08:10:27.615371 \| 127.0.0.1 \| 148 \| 127.0.0.1 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2023-03-24 08:10:27.615385 \| 127.0.0.1 \| 163 \| 127.0.0.1 Querying is done [shard 0] \| 2023-03-24 08:10:27.615583 \| 127.0.0.1 \| 360 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2023-03-24 08:10:27.615730 \| 127.0.0.1 \| 507 \| 127.0.0.1 Request complete \| 2023-03-24 08:10:27.615518 \| 127.0.0.1 \| 518 \| 127.0.0.1 See the message: Continuing paged query, previous page's trace session is dcfc1570-ca3c-11ed-88d0-24443f03a8bb [shard 0] \| 2023-03-24 08:10:27.615364 \| 127.0.0.1 \| 141 \| 127.0.0.1 This is a folow-up to #13255 Refs: #12781 Closes #13318	2023-03-26 18:41:21 +03:00
Avi Kivity	f937fad25a	Merge 'readers/multishard: shard_reader: fast-forward created reader to current range' from Botond Dénes When creating the reader, the lifecycle policy might return one that was saved on the last page and survived in the cache. This reader might have skipped some fast-forwarding ranges while sitting in the cache. To avoid using a reader reading a stale range (from the read's POV), check its read range and fast forward it if necessary. Fixes: https://github.com/scylladb/scylladb/issues/12916 Closes #12932 * github.com:scylladb/scylladb: readers/multishard: shard_reader: fast-forward created reader to current range readers/multishard: reader_lifecycle_policy: add get_read_range() test/boost/multishard_mutation_query_test: paging: handle range becoming wrapping	2023-03-26 18:39:50 +03:00
Wojciech Mitros	f0aa540e00	cql: renice the wasm compilation alien thread The Wasm compilation is a slow, low priority task, so it should not compete with reactor threads or the networking core. To achieve that, we increase the niceness of the thread by 10. An alternative solution would be to set the priority using pthread_setschedparam, but it's not currently feasible, because as long as we're using the SCHED_OTHER policy for our threads, we cannot select any other priority than 0. Closes #13307	2023-03-26 18:38:23 +03:00
Anna Stuchlik	1cfea1f13c	doc: remove incorrect info about BYPASS CACHE Fixes https://github.com/scylladb/scylladb/issues/13106 This commit removes the information that BYPASS CACHE is an Enterprise-only feature and replaces that info with the link to the BYPASS CACHE description. Closes #13316	2023-03-26 18:13:17 +03:00
Kefu Chai	e796525f23	types: remove unused header <iterator> was introduced back in `1cf02cb9d8`, but lexicographical_compare.hh was extracted out in `bdfc0aa748`, since we don't have any users of <iterator> in types.hh anymore, let's remove it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13327	2023-03-26 16:55:16 +03:00
Avi Kivity	eeff8cd075	Merge 'dist/redhat: enforce dependency on %{release} also' from Kefu Chai s/%{version}/%{version}-%{release}/ in `Requires:` sections. this enforces the runtime dependencies of exactly the same releases between scylla packages. Fixes #13222 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13229 * github.com:scylladb/scylladb: dist/redhat: split Requires section into multiple lines dist/redhat: enforce dependency on %{release} also	2023-03-26 16:50:10 +03:00
Avi Kivity	bfd70c192e	cql3: functions: reimplement min/max statelessly min() and max() had two implementations: one static (for each type in a select list) and one dynamic (for compound types). Since the dynamic implementation is sufficient, we only reimplement that. This means we don't use the automarshalling helpers, since we don't do any arithemetic on values apart from comparison, which is conveniently provided by abstract_type.	2023-03-26 15:18:22 +03:00
Avi Kivity	e6342d476b	cql3: functions: reimplement count(*) statelessly Note we have to explicitly decay lambdas to functions using unary operator +.	2023-03-26 15:18:22 +03:00
Avi Kivity	9291ec5ed1	cql3: functions: simplify creating native functions even more Add a helper function to consolidate the internal native function class and the automatic marshalling introduced in previous patches. Since decaying a lambda into a function pointer (in order to infer its signature) there are two overloads: one accepts a lambda and decays it into a function pointer, the second accepts a function pointer, infers its argument, and constructs the function object.	2023-03-26 15:15:36 +03:00
Kefu Chai	3425184b2a	raft: include boost header using <path/to/header> not "path/to/header" for more consistency with the rest of the source tree. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-26 14:07:50 +08:00
Kefu Chai	0421d6d12f	raft: include used header at least, we need to access the declarations of exceptions, like `not_a_leader` and `dropped_entry`, so, instead of relying on other header to do this job for us, we should include the header which include the declaration. so, in this chance "raft.h" is include explicitly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-26 14:07:50 +08:00
Kefu Chai	023e985a6c	build: cmake: add missing source files to idl and service they were added recently, but cmake failed to sync with configure.py. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-26 14:01:21 +08:00
Kefu Chai	e0ca80d21f	build: cmake: port more cxxflags from configure.py Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-26 14:01:21 +08:00
Kefu Chai	a5547ea11b	build: cmake: add two missing tests they are leftovers in `f113dac5bf` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-26 14:01:21 +08:00
Tzach Livyatan	46e6c639d9	docs: minor improvments to the Raft Handling Failures and recovery procedure sections Closes #13292	2023-03-24 18:17:36 +01:00
Botond Dénes	b6682ad607	docs/operating-scylla/admin-tools: scylla-sstable.rst: update schema section With the recent changes to the ways schema can be provided to the tool.	2023-03-24 11:41:40 -04:00
Botond Dénes	bc9341b84a	test/cql-pytest: test_tools.py: add test for schema loading A comprehensive test covering all the supported ways of providing the schema to scylla-sstable, either explicitely or implicitely (auto-detect).	2023-03-24 11:41:40 -04:00
Botond Dénes	afdfe34ca7	test/cql-pytest: nodetool.py: add flush_keyspace() It would have been better if `flush()` could have been called with a keyspace and optional table param, but changing it now is too much churn, so we add a dedicated method to flush a keyspace instead.	2023-03-24 11:41:40 -04:00
Botond Dénes	1f0ab699c3	tools/scylla-sstable: reform schema loading mechanism So far, schema had to be provided via a schema.cql file, a file which contains the CQL definition of the table. This is flexible but annoying at the same time. Many times sstables the tool operates on are located in their table directory in a scylla data directory, where the schema tables are also available. To mitigate this, an alternative method to load the schema from memory was added which works for system tables. In this commit we extend this to work for all kind of tables: by auto-detecting where the scylla data directory is, and loading the schema tables from disk.	2023-03-24 11:41:40 -04:00
Botond Dénes	c5b2fc2502	tools/schema_loader: add load_schema_from_schema_tables() Allows loading the schema for the designated keyspace and table, from the system table sstables located on disk. The sstable files opened for read only.	2023-03-24 11:41:40 -04:00
Botond Dénes	19560419d2	Merge 'treewide: improve compatibility with gcc 13' from Avi Kivity An assortment of patches that reduce our incompatibilities with the upcoming gcc 13. Closes #13243 * github.com:scylladb/scylladb: transport: correctly format unknown opcode treewide: catch by reference test: raft: avoid confusing string compare utils, types, test: extract lexicographical compare utilities test: raft: fsm_test: disambiguate raft::configuration construction test: reader_concurrency_semaphore_test: handle all enum values repair: fix signed/unsigned compare repair: fix incorrect signed/unsigned compare treewide: avoid unused variables in if statements keys: disambiguate construction from initializer_list<bytes> cql3: expr: fix serialize_listlike() reference-to-temporary with gcc compaction: error on invalid scrub type treewide: prevent redefining names api: task_manager: fix signed/unsigned compare alternator: streams: fix signed/unsigned comparison test: fix some mismatched signed/unsigned comparisons	2023-03-24 15:16:05 +02:00
Botond Dénes	132d101dc7	db/schema_tables: expose types schema	2023-03-24 08:50:39 -04:00
Botond Dénes	14bff955e2	readers/multishard: shard_reader: fast-forward created reader to current range When creating the reader, the lifecycle policy might return one that was saved on the last page and survived in the cache. This reader might have skipped some fast-forwarding ranges while sitting in the cache. To avoid using a reader reading a stale range (from the read's POV), check its read range and fast forward it if necessary.	2023-03-24 08:43:03 -04:00
Botond Dénes	0aa03f85a3	readers/multishard: reader_lifecycle_policy: add get_read_range() Allows retrieving the current read-range for the reader on the given shard (where the method is called).	2023-03-24 08:40:11 -04:00
Botond Dénes	1c7a66cd2a	test/boost/multishard_mutation_query_test: paging: handle range becoming wrapping After each page, the read range is adjusted so it continues from/after the last read partition. Sometimes this can result in the range becoming wrapped like this: (pk, pk]. In this case, we can just drop this range and continue with the rest of the ranges (if there are multiple ones).	2023-03-24 08:40:11 -04:00
Tomasz Grabiec	c54a3d9c10	Merge 'Clean enabled features manipulations in system keyspace' from Pavel Emelyanov There was an attempt to cut feature-service -> system-keyspace dependency (#13172) which turned out to require more changes. Here's a preparation squeezing from this future work. This set - leaves only batch-enabling API in feature service - keeps the need for async context in feature service - narrows down system keyspace features API to only load and store records - relaxes features updating logic in sys.ks. - cosmetic Closes #13264 * github.com:scylladb/scylladb: feature_service: Indentation fix after previous patch feature_service: Move async context into enable() system_keyspace: Refactor local features load/save helpers feature_service: Mark supported_feature_set() const feature_service: Remove single feature enabling method boot: Enable features in batch gossiper: Enable features in batch	2023-03-24 13:12:49 +01:00
Petr Gusev	c1634ea5fa	test_raft_upgrade: add a test for schema commit log feature The test tries to start a node with consistent_cluster_management but without force_schema_commit_log. This is expected to fail, since the schema commitlog feature should be enabled by all the cluster nodes.	2023-03-24 16:08:17 +04:00
Petr Gusev	e407956e9f	scylla_cluster.py: add start flag to server_add Sometimes when creating a node it's useful to just install it and not start. For example, we may want to try to start it later with expected error. The ScyllaServer.install method has been made exception safe, if an exception occurs, it reverts to the original state. This allows to not duplicate the try/except logic in two of its call sites.	2023-03-24 16:08:17 +04:00
Petr Gusev	794d0e4000	ServerInfo: drop host_id We are going to allow the ScyllaCluster.add_server function not to start the server if the caller has requested that with a special parameter. The host_id can only be obtained from a running node, so add_server won't be able to return it in this case. I've grepped the tests for host_id and there doesn't seem to be any reference to it in the code.	2023-03-24 16:08:17 +04:00
Petr Gusev	8e3392c64f	scylla_cluster.py: add config to server_add Sometimes when creating a node it's useful to pass a custom node config.	2023-03-24 16:08:17 +04:00
Petr Gusev	c1d0ee2bce	scylla_cluster.py: add expected_error to server_start Sometimes it's useful to check that the node has failed to start for a particular reason. If server_start can't find expected_error in the node's log or if the node has started without errors, it throws an exception.	2023-03-24 16:08:11 +04:00
Petr Gusev	a4411e9ec4	scylla_cluster.py: ScyllaServer.start, refactor error reporting Extract the function that encapsulates all the error reporting logic. We are going to use it in several other places to implement expected_error feature.	2023-03-24 15:54:52 +04:00
Petr Gusev	21b505e67c	scylla_cluster.py: fix ScyllaServer.start, reset cmd if start failed The ScyllaServer expects cmd to be None if the Scylla process is not running. Otherwise, if start failed and the test called update_config, the latter will try to send a signal to a non-existent process via cmd.	2023-03-24 15:54:52 +04:00
Petr Gusev	75a4ff2da9	raft: check if schema commitlog is initialized Refuse to boot if neither the schema commitlog feature nor force_schema_commit_log is set. For the upgrade procedure the user should wait until the schema commitlog feature is enabled before enabling consistent_cluster_management.	2023-03-24 15:54:52 +04:00
Petr Gusev	d8997a4993	raft: move raft initialization after init_system_keyspace Raft tables are loaded on the second call to init_system_keyspace, so it seems more logical to move initialization after it. This is not necessary right now since raft tables are not used in this initialization logic, but it may change in the future and cause troubles.	2023-03-24 15:54:52 +04:00
Petr Gusev	769732d095	database: rename before_schema_keyspace_init->maybe_init_schema_commitlog We are going to move the raft tables from the first load phase to the second. This means the second init_system_keyspace call will load raft tables along with the schema, making the name of this function imprecise.	2023-03-24 15:54:52 +04:00
Petr Gusev	273e70e1f9	raft: use schema commitlog for raft tables Fixes: #12642	2023-03-24 15:54:52 +04:00
Petr Gusev	5a5d664a5a	init_system_keyspace: refactoring towards explicit load phases We aim (#12642) to use the schema commit log for raft tables. Now they are loaded at the first call to init_system_keyspace in main.cc, but the schema commitlog is only initialized shortly before the second call. This is important, since the schema commitlog initialization (database::before_schema_keyspace_init) needs to access schema commitlog feature, which is loaded from system.scylla_local and therefore is only available after the first init_system_keyspace call. So the idea is to defer the loading of the raft tables until the second call to init_system_keyspace, just as it works for schema tables. For this we need a tool to mark which tables should be loaded in the first or second phase. To do this, in this patch we introduce system_table_load_phase enum. It's set in the schema_static_props for schema tables. It replaces the system_keyspace::table_selector in the signature of init_system_keyspace. The call site for populate_keyspace in init_system_keyspace was changed, table_selector.contains_keyspace was replaced with db.local().has_keyspace. This check prevents calling populate_keyspace(system_schema) on phase1, but allows for populate_keyspace(system) on phase2 (to init raft tables). On this second call some tables from system keyspace (e.g. system.local) may have already been populated on phase1. This check protects from double-populating them, since every populated cf is marked as ready_for_writes.	2023-03-24 15:54:46 +04:00
Anna Stuchlik	9e27f6b4b7	doc: update the Ubuntu version used in the image Starting from 5.2 and 2023.1 our images are based on Ubuntu:22.04. See https://github.com/scylladb/scylladb/issues/13138#issuecomment-1467737084 This commit adds that information to the docs. It should be merged and backported to branch-5.2. Closes #13301	2023-03-24 13:50:51 +02:00
Kamil Braun	0b19a614fa	storage_service: wait for normal state handlers earlier in the boot procedure The `wait_for_normal_state_handled_on_boot` function waits until `handle_state_normal` finishes for the given set of nodes. It was used in `run_bootstrap_ops` and `run_replace_ops` to wait until NORMAL states of existing nodes in the cluster are processed by the joining node before continuing the joining process. One reason to do it is because at the end of `handle_state_normal` the joining node might drop connections to the NORMAL nodes in order to reestablish new connections using correct encryption settings. In tests we observed that the connection drop was happening in the middle of repair/streaming, causing repair/streaming to abort. Unfortunately, calling `wait_for_normal_state_handled_on_boot` in `run_bootstrap_ops`/`run_replace_ops` is too late to fix all problems. Before either of these two functions, we create a new CDC generation and write the data to `system_distributed_everywhere.cdc_generation_descriptions_v2`. In tests, the connections were sometimes dropped while this write was in-flight. This would cause the write to never arrive to other nodes, and the joining node would timeout waiting for confirmations. To fix this, call `wait_for_normal_state_handled_on_boot` earlier in the boot procedure, before `make_new_generation` call which does the write. Fixes: #13302	2023-03-24 12:45:07 +01:00
Kamil Braun	451389970b	storage_service: bootstrap: wait for normal tokens to arrive in all cases `storage_service::bootstrap` waits until it receives normal tokens of other nodes before proceeding or it times out with an error. But it only did that for bootstrap operation, not for replace operation. Do it for replace as well.	2023-03-24 12:44:37 +01:00
Kamil Braun	c003b7017d	storage_service: extract get_nodes_to_sync_with helper	2023-03-24 12:44:37 +01:00
Kamil Braun	599393dcba	storage_service: return unordered_set from get_ignore_dead_nodes_for_replace	2023-03-24 12:44:37 +01:00
Anna Stuchlik	73b74e8cac	doc: remove Enterprise upgrade guides from OSS doc This commit removes the Enterprise upgrade guides from the Open Source documentation. The Enterprise upgrade guides should only be available in the Enterprise documentation, with the source files stored in scylla-enterprise.git. In addition, this commit: - adds the links to the Enterprise user guides in the Enterprise documentation at https://enterprise.docs.scylladb.com/ - adds the redirections for the removed pages to avoid breaking any links. This commit must be reverted in scylla-enterprise.git. Closes #13298	2023-03-24 10:57:03 +02:00
Kefu Chai	a7b4f84b6a	bloom_filter: mark internal help function static as `initialize_opt_k()` is not used out side of the translation unit, let's mark it `static`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-24 15:41:45 +08:00
Kefu Chai	1a82a7ac72	bloom_filter: add more constness to false positive rate tables we never mutate them, so mark them const for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-24 15:41:45 +08:00
Kefu Chai	7f4a3fdac8	bloom_filter: use vector::back() when appropriate no need to use `size - 1` for accessing the last element in a vector, let's just use `vector::back()` for more compacted code. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-24 15:41:45 +08:00
Jan Ciolek	a1c86786ca	db/view/view.cc: rate limit view update error messages When propagating a view update to a paired view replica fails, there is an error message. This message is printed for every mutation, which causes log spam when some node goes down. This isn't a fatal error - it's normal that a remote view replica goes down, it'll hopefully receive the updates later through hints. I'm unsure if the error message should be printed at all, but for now we can just rate limit it and that will improve the situation with log spamming. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> Closes #13175	2023-03-24 08:59:39 +02:00
Pavel Emelyanov	b0a5769d92	validation: Avoid throwing schema lookup The validate_column_family() tries to find a schema and throws if it doesn't exist. The latter is determined by the exception thrown by the database::find_schema(), but there's a throw-less way of doing it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13295	2023-03-24 08:43:48 +02:00
Kamil Braun	e8fb718e4a	Merge 'topology changes over raft' from Gleb Natapov The patch series introduces linearisable topology changes using raft protocol. The state machine driven by raft is described in "service: Introduce topology state machine". Some explanations about the implementation can be found in "storage_service: raft topology: implement topology management through raft". The code is not ready for production. There is not much in terms of error handling and integration with the rest of the system is not even started. For full integration request fencing will need to be implemented and token_metadata has to be extended to support not just "pending" nodes but concepts of "read replica set" and "write replica set". The code may be far from be usable, but it is hidden behind the "experimental raft" flag and having it in tree will relieve me from constant rebase burden. * 'raft-topology-v6' of github.com:scylladb/scylla-dev: storage_service: fix indentation from previous patch storage_service: raft topology: implement topology management through raft service: raft: make group0_guard move assignable service: raft: wire up apply() and snapshot transfer for topology in group0 state machine storage_service: raft topology: introduce a function that applies topology cmd to local state machine storage_service: raft topology: introduce a raft monitor and topology coordinator fibers storage_service: raft topology: introduce snapshot transfer code for the topology table raft topology: add RAFT_TOPOLOGY_CMD verb that will be used by topology coordinator to communicated with nodes bootstrapper: Add get_random_bootstrap_tokens function service: raft: add support for topology_change command into raft_group0_client service: raft: introduce topology_change group0 command system_keyspace: add a table to persist topology change state machine's state service: Introduce topology state machine data structures storage_proxy: not consult topology on local table write	2023-03-23 15:59:45 +01:00
Gleb Natapov	5a908c3f46	storage_service: fix indentation from previous patch	2023-03-23 16:29:56 +02:00
Gleb Natapov	f3bd7e9b8c	storage_service: raft topology: implement topology management through raft The code here implements the state machine described in "service: Introduce topology state machine". A topology operation is requested by writing into topology_request field through raft. After that topology_change_transition() function running on a leader is responsible to drive the operation to completion. There is no much in terms of error handling here yet. It something fails the code will just continue trying. topology_change_state_load() which is (eventually) called on all nodes each time state machine's state changes is a glue between the raft view of the topology and the rest of the "legacy" system. The code there creates token_metadata object from the raft view and fills in peers table which is needed for drivers. The gossiper is almost completely cut of from the topology management, but the code still updates node's sate there to 'normal' and 'left' for some legacy functionality to continue working. Note that handlers for those states are disabled in raft mode. raft_topology_cmd_handler() is called by topology coordinator and this is where the streaming happens. The kind of streaming depends on the state the node is in. The function is "re-entrable". It can be called more then once and will either start new operation if it is the first invocation or previous one failed, or it will wait from previous operation to complete. The new code is hidden behind "experimental raft" and should not change how the system works if disabled. Some indentation here is intentionally left wrong and will be fixed by the next patch.	2023-03-23 16:29:56 +02:00
Gleb Natapov	8865d5cf13	service: raft: make group0_guard move assignable	2023-03-23 16:29:56 +02:00
Gleb Natapov	344b483425	service: raft: wire up apply() and snapshot transfer for topology in group0 state machine	2023-03-23 16:29:56 +02:00
Gleb Natapov	aca21d3318	storage_service: raft topology: introduce a function that applies topology cmd to local state machine The function applies to persistent storage and call stub function topology_change_state_load() that will load the new state into the memory in later patches.	2023-03-23 16:29:56 +02:00
Gleb Natapov	284afd9255	storage_service: raft topology: introduce a raft monitor and topology coordinator fibers Raft monitor fiber monitors local's server raft state and starts the topology coordinator fiber when it becomes a leader. Stops it when it is not longer a leader. The coordinator fiber waits for topology state changes, but there will be none yet.	2023-03-23 16:29:56 +02:00
Gleb Natapov	d69a887366	storage_service: raft topology: introduce snapshot transfer code for the topology table	2023-03-23 16:29:56 +02:00
Gleb Natapov	6a4d773b7e	raft topology: add RAFT_TOPOLOGY_CMD verb that will be used by topology coordinator to communicated with nodes Empty for now. Will be used later by the topology coordinator to communicate with other nodes to instruct them to start streaming, or start to fence read/writes.	2023-03-23 16:29:56 +02:00
Nadav Har'El	4fdcee8415	test/alternator: increase CQL connection timeout This patch increases the connection timeout in the get_cql_cluster() function in test/cql-pytest/run.py. This function is used to test that Scylla came up, and also test/alternator/run uses it to set up the authentication - which can only be done through CQL. The Python driver has 2-second and 5-second default timeouts that should have been more than enough for everybody (TM), but in #13239 we saw that in one case it apparently wasn't enough. So to be extra safe, let's increase the default connection-related timeouts to 60 seconds. Note this change only affects the Scylla boot in the test/*/run scripts, and it does not affect the actual tests - those have different code to connect to Scylla (see cql_session() in test/cql-pytest/util.py), and we already increased the timeouts there in #11289. Fixes #13239 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13291	2023-03-23 16:03:20 +02:00
Avi Kivity	afe6b0d8c9	Merge 'reader_concurrency_semaphore: add trace points for important events' from Botond Dénes Currently we have no visibility into what happens to a read in the reader concurrency semaphore as far as tracing is concerned. This series fixes that, storing a trace state pointer on the reader permit and using it to add trace messages to important semaphore related events: * admission decision * execution (execution stage functionality) * eviction This allows for seeing if the read suffered any delay in the semaphore. Example tracing (2 pages): ``` Tracing session: 8cc80d50-c72d-11ed-8427-14e21cc3ed56 activity \| timestamp \| source \| source_elapsed \| client -------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2023-03-20 10:43:16.773000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2023-03-20 10:43:16.773754 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2023-03-20 10:43:16.773837 \| 127.0.0.1 \| 83 \| 127.0.0.1 Creating read executor for token -4911109968640856406 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 0] \| 2023-03-20 10:43:16.773874 \| 127.0.0.1 \| 121 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2023-03-20 10:43:16.773877 \| 127.0.0.1 \| 123 \| 127.0.0.1 Start querying singular range {{-4911109968640856406, pk{000d73797374656d5f736368656d61}}} [shard 0] \| 2023-03-20 10:43:16.773881 \| 127.0.0.1 \| 128 \| 127.0.0.1 [reader concurrency semaphore] admitted immediately [shard 0] \| 2023-03-20 10:43:16.773884 \| 127.0.0.1 \| 130 \| 127.0.0.1 [reader concurrency semaphore] executing read [shard 0] \| 2023-03-20 10:43:16.773890 \| 127.0.0.1 \| 137 \| 127.0.0.1 Querying cache for range {{-4911109968640856406, pk{000d73797374656d5f736368656d61}}} and slice {(-inf, +inf)} [shard 0] \| 2023-03-20 10:43:16.773903 \| 127.0.0.1 \| 149 \| 127.0.0.1 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 100 clustering row(s) (100 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2023-03-20 10:43:16.774674 \| 127.0.0.1 \| 920 \| 127.0.0.1 Caching querier with key 5eff94d2-e47a-43b2-8e3a-2d80a9cc3b3e [shard 0] \| 2023-03-20 10:43:16.774685 \| 127.0.0.1 \| 931 \| 127.0.0.1 Querying is done [shard 0] \| 2023-03-20 10:43:16.774688 \| 127.0.0.1 \| 934 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2023-03-20 10:43:16.774706 \| 127.0.0.1 \| 953 \| 127.0.0.1 Request complete \| 2023-03-20 10:43:16.774225 \| 127.0.0.1 \| 1225 \| 127.0.0.1 Tracing session: 8d26f630-c72d-11ed-8427-14e21cc3ed56 activity \| timestamp \| source \| source_elapsed \| client ---------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2023-03-20 10:43:17.395000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2023-03-20 10:43:17.395498 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2023-03-20 10:43:17.395558 \| 127.0.0.1 \| 60 \| 127.0.0.1 Creating read executor for token -4911109968640856406 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 0] \| 2023-03-20 10:43:17.395597 \| 127.0.0.1 \| 99 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2023-03-20 10:43:17.395600 \| 127.0.0.1 \| 102 \| 127.0.0.1 Start querying singular range {{-4911109968640856406, pk{000d73797374656d5f736368656d61}}} [shard 0] \| 2023-03-20 10:43:17.395604 \| 127.0.0.1 \| 106 \| 127.0.0.1 Found cached querier for key 5eff94d2-e47a-43b2-8e3a-2d80a9cc3b3e and range(s) {{{-4911109968640856406, pk{000d73797374656d5f736368656d61}}}} [shard 0] \| 2023-03-20 10:43:17.395610 \| 127.0.0.1 \| 112 \| 127.0.0.1 Reusing querier [shard 0] \| 2023-03-20 10:43:17.395614 \| 127.0.0.1 \| 116 \| 127.0.0.1 [reader concurrency semaphore] executing read [shard 0] \| 2023-03-20 10:43:17.395622 \| 127.0.0.1 \| 125 \| 127.0.0.1 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 11 clustering row(s) (11 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2023-03-20 10:43:17.395711 \| 127.0.0.1 \| 213 \| 127.0.0.1 Querying is done [shard 0] \| 2023-03-20 10:43:17.395718 \| 127.0.0.1 \| 221 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2023-03-20 10:43:17.395734 \| 127.0.0.1 \| 236 \| 127.0.0.1 Request complete \| 2023-03-20 10:43:17.395276 \| 127.0.0.1 \| 276 \| 127.0.0.1 ``` Fixes: https://github.com/scylladb/scylladb/issues/12781 Closes #13255 * github.com:scylladb/scylladb: reader_concurrency_semaphore: add trace points for important events reader_permit: refresh trace_state on new pages reader_permit: keep trace_state pointer on permit test/perf/perf_collection: give more unique names to key comparators	2023-03-23 15:37:33 +02:00
Botond Dénes	7699904c54	Revert "repair: Reduce repair reader eviction with diff shard count" This reverts commit `c6087cf3a0`. Said commit can cause a deadlock when 2 or more repairs compete for locks on 2 or more nodes. Consider the following scenario: Node n1 and n2 in the cluster, 1 shard per node, rf = 2, each shard has 1 available unit for the reader lock n1 starts repair r1 r1-n1 (instance of r1 on node1) takes the reader lock on node1 n2 starts repair r2 r2-n2 (instance of r2 on node2) takes the reader lock on node2 r1-n2 will fail to take the reader lock on node2 r2-n1 will fail to take the reader lock on node1 As a result, r1 and r2 could not make progress and deadlock happens. The complexity comes from the fact that a repair job needs lock on more than one node. It is not guaranteed that all the participant nodes could take the lock in one short. There is no simple solution to this so we have to revert this locking mechanism and look for another way to prevent reader trashing when repairing nodes with mismatching shard count. Fixes: #12693 Closes #13266	2023-03-23 15:35:32 +02:00
Nadav Har'El	b5e61e1b83	test/cql-pytest, lwt: test for detection of contradicting batches Cassandra detects when a batch has both an IF EXISTS and IF NOT EXISTS on the same row, and complains this is not a useful request (after all, it can never succeed, because the batch can only succeed if both conditions are true, and that can't be if one checks IF EXISTS and the other IF NOT EXISTS). This patch adds a test, test_lwt_with_batch_conflict_1, which checks that this case results in an error. It passes on Cassandra, but xfails on Scylla which doesn't report an error in this case. A second test, test_lwt_with_batch_conflict_2, shows that the detection of the EXISTS / NOT EXISTS conflict is special, and other conflicts such as having both "r=1" and "r=2" for the same row, are NOT detected by Cassandra. Refs #13011. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13270	2023-03-23 13:35:21 +02:00
Pavel Emelyanov	b13ff5248c	sstables: Mark continuous_data_consumer::reader_position() const Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13285	2023-03-23 13:27:33 +02:00
Pavel Emelyanov	bee5593ba1	storage_service: Move node_ops_meta_data to .cc file It's declared in header, but is not used outside of .cc. Forward declaration in header would be enough. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13289	2023-03-23 13:22:39 +02:00
Tzach Livyatan	ea66c16818	Fix Enable Authorization doc page references a wrong CL used by a 'cassandra' user Fix https://github.com/scylladb/scylladb/issues/11633 Closes #11637	2023-03-23 13:20:36 +02:00
Kefu Chai	0421a82821	sstables: add type constraits right in parameter list for better readability. also, add `#include <concepts>`, as we should include what we use instead of relying on other headers do this on behalf of us. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13277	2023-03-23 13:57:22 +03:00
Anna Stuchlik	b54868c639	doc: disable the outdated banner This commit disables the banner that advertises ScyllaDB University Live event, which aleardy took place. Closes #13284	2023-03-23 08:57:45 +02:00
Kefu Chai	1197664f09	test: network_topology_strategy_test: silence warning clang warns when the implicit conversion changes the precision of the converted number. in this case, the before being multiplied, `std::numeric_limits<unsigned long>::max() >> 1` is implicitly promoted to double so it can obtain the common type of double and unsigned long. and the compiler warns: ``` /home/kefu/dev/scylladb/test/boost/network_topology_strategy_test.cc:129:84: error: implicit conversion from 'unsigned long' to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Werror,-Wimplicit-const-int-float-conversion] return static_cast<unsigned long>(d(std::numeric_limits<unsigned long>::max() >> 1)) << 1; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~ ``` but 1. we don't really care about the precision here, we just want to map a double to a token represented by an int64_t 2. the maximum possible number being converted is less than 9223372036854775807, which is the maximum number of int64_t, which is in general an alias of `long long`, not to mention that LONG_MAX is always 2147483647, after shifting right, the result would be 1073741823 so this is a false alarm. in order to silence it, we explicitly cast the RHS of `` operator to double. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13221	2023-03-23 08:55:29 +02:00
Botond Dénes	aee5dfaa84	Merge 'docs: Add card logos' from David Garcia Related issue https://github.com/scylladb/scylladb/issues/13119 Adds product logos to cards Preview: ![Welcome-to-ScyllaDB-Documentation-ScyllaDB-Docs (1)](https://user-images.githubusercontent.com/9107969/224996621-6c93676d-1427-4a28-a529-fd3cd2bc2d61.png) Closes #13167 * github.com:scylladb/scylladb: docs: Update custom styles docs: Update styles docs: Add card logos	2023-03-23 08:53:58 +02:00
Botond Dénes	0f5e845399	Merge 'docs: scylladb better php driver' from Daniel Reis Hey y'all! Me and @malusev998 are maintaining a updated version of the [PHP Driver ](https://github.com/he4rt/scylladb-php-driver) together with @he4rt community and it had a bunch of improvements on these last month. Before it was working only at PHP 7.1 (DataStax branch), and at our branch we have it working at PHP 8.1 and 8.2. We are also using the ScyllaDB C++ Driver on this project and I think that is a good idea to point new users for this project since it's the most updated PHP Driver maintained now. What do y'all think about that? Closes #13218 * github.com:scylladb/scylladb: fix: links to php driver fix: adding php versions into driver's description docs: scylladb better php driver	2023-03-23 08:53:30 +02:00
Tzach Livyatan	2d40952737	DOCS: remove invalid example from DML reference, WHERE clause section Closes #12596	2023-03-22 18:37:20 +02:00
Nadav Har'El	d1e6d9103a	Merge 'api: reference httpd::* symbols like 'httpd::'' from Kefu Chai this change is a leftover of `063b3be8a7`, which failed to include the changes in the header files. it turns out we have `using namespace httpd;` in seastar's `request_parser.rl`, and we should not rely on this statement to expose the symbols in `seatar::httpd` to `seastar` namespace. in this change, also, sine `get_name()` previously a non-static member function of `seastar_test` is now a static member function, so we need to update the tests which capture `this` for calling this function, so they don't capture `this` anymore. Closes #13202 github.com:scylladb/scylladb: test: drop unused captured variables Update seastar submodule	2023-03-22 18:16:15 +02:00
Kefu Chai	596ea6d439	test: drop unused captured variables this should silence the warning like: ``` test/boost/multishard_mutation_query_test.cc:493:29: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture] do_with_cql_env_thread([this] (cql_test_env& env) -> future<> { ^~~~ test/boost/multishard_mutation_query_test.cc:577:29: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture] do_with_cql_env_thread([this] (cql_test_env& env) -> future<> { ^~~~ 2 errors generated. ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-22 21:21:04 +08:00
Avi Kivity	4a18ee87eb	Update seastar submodule * seastar 9cbc1fe889...1204efbc5e (14): > http: Add lost pragma once into client.hh > prometheus, http: do not expose httpd::* in seastar > build: add haswell support > ci: fix configuration to build checkheaders target. > core: map_reduce: Fix use-after-free in variant with futurized reducer > Merge 'tests: support boost::test decorators and tolerate failures in test_spawn_input' from Kefu Chai > memory: support reallocing foreign (non-Seastar) memory on a reactor thread > test: futures: disable -Wself-move for GCC>=13 > map_reduce: do not move a temporary object > doc/building-dpdk.md: drop extraneous '$' > http: url_decode: translate plus back into char > Merge 'seastar-json2code: cleanups' from Kefu Chai > Fix markdown formatting > Merge 'Minor abort on OOM changes' from Travis Downs	2023-03-22 21:21:04 +08:00
Benny Halevy	c09d0f6694	everywhere: use sstables::generation_type Use generation_type rather than generation_type::int_t where possible and removed the deprecated functions accepting the int_t.i Ref #10459 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-22 13:59:47 +02:00
Benny Halevy	b597f41b8c	test: sstable_test_env: use make_new_generation Also, add a bunch of make_sstable variants that get a generation_type param for this. With that, the entry points for generation_type::int_t are deprecated and their users will be converted in following patches. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-22 13:58:59 +02:00
Benny Halevy	a0e43af576	sstable_directory::components_lister::process: fixup indentation	2023-03-22 13:58:43 +02:00
Benny Halevy	a8dc2fda29	sstables: make highest_generation_seen return optional generation It is possible to find no generation in an empty table directory, and in he future, with uuid generations it'd be possible to find no numeric generations in the directory. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-22 13:55:23 +02:00
Benny Halevy	ba680a7b96	replica: table: add make_new_generation function make_new_generation generates a new generation from an optional one. If disengaged, it just generates a new generation based on the shard_id. Otherwise, it generates the next generation in sequence by adding smp::count to the previous value, like we do today. In the future, with uuid-based generations, the function could be used to generate a new random uuid based on the optional parameter. It will be up to the caller, e.g. replica::table or sstables manager to decide which kind of generation to create. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-22 13:52:22 +02:00
Benny Halevy	b28eacce6f	replica: table: move sstable generation related functions out of line updating the highest generation happens only during startup and creating sstables is done rarely enough there is no reason to inline either functions. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-22 13:49:18 +02:00
Benny Halevy	d4d480a374	test: sstables: use generation_type::int_t Convert all users to use sstables::generation_type::int_t. Further patches will continue to convert most to using sstables::generation_type instead so we can abstract the value type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-22 13:48:50 +02:00
Benny Halevy	30cc0beb47	sstables: generation_type: define int_t So it can be used everywhere to prepare for uuid sstable generation support. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-22 13:36:52 +02:00
Vlad Zolotarov	f94bbc5b34	transport: add per-scheduling-group CQL opcode-specific metrics This patch extends a previous patch that added these metrics globally: - cql_requests_count - cql_request_bytes - cql_response_bytes This patch adds a "scheduling_group_name" label to these metrics and changes corresponding counters to be accounted on a per-scheduling-group level. As a bonus this patch also marks all 3 metrics as 'skip_when_empty'. Ref #13061 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20230321201412.3004845-1-vladz@scylladb.com>	2023-03-22 13:27:48 +02:00
Botond Dénes	ff87f95a26	reader_concurrency_semaphore: add trace points for important events Notably, to admission execution and eviction. Registering/unregistering the permit as inactive is not traced, as this happens on every buffer-fill for range scans. Semaphore trace messages have a "[reader_concurrency_semaphore]" prefix to allow them to be clearly associated with the semaphore.	2023-03-22 04:58:18 -04:00
Botond Dénes	1f51f752cc	reader_permit: refresh trace_state on new pages To make sure all tracing done on a certain page will make its way into the appropriate trace session. This is a contination of the previous patch (which added trace pointer to the permit).	2023-03-22 04:58:10 -04:00
Botond Dénes	156e5d346d	reader_permit: keep trace_state pointer on permit And propagate it down to where it is created. This will be used to add trace points for semaphore related events, but this will come in the next patches.	2023-03-22 04:58:01 -04:00
Botond Dénes	27a4c24522	test/perf/perf_collection: give more unique names to key comparators perf.cc has two key comparators: key_compare and key_tri_compare. These are very generic name, in fact key_compare directly clashes with a comparator with the same name in types.hh. Avoid the clash by renaming both of these to a more unique name.	2023-03-22 04:58:01 -04:00
Nadav Har'El	2038388268	cql-pytest: translate Cassandra's tests for multi-column relations This is a translation of Cassandra's CQL unit test source file validation/operations/SelectMultiColumnRelationTest.java into our cql-pytest framework. The tests reproduce four already-known Scylla bugs and three new bugs. All tests pass on Cassandra. Because of these bugs 9 of the 22 tests are marked xfail, and one is marked skip (it crashes Scylla). Already known issues: Refs #64: CQL Multi column restrictions are allowed only on a clustering key prefix Refs #4178: Not covered corner case for key prefix optimization in filtering Refs #4244: Add support for mixing token, multi- and single-column restrictions Refs #8627: Cleanly reject updates with indexed values where value > 64k New issue discovered by these tests: Refs #13217: Internal server error when null is used in multi-column relation Refs #13241: Multi-column IN restriction with tuples of different lengths crashes Scylla Refs #13250: One-element multi-column restriction should be handled like a single-column restriction Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13265	2023-03-22 09:54:32 +02:00
Tzach Livyatan	083408723f	doc: Add Mumur term to the glossery Point to the difference between the official MurmurHash3 and Scylla / Cassandra implementation Update docs/glossary.rst Co-authored-by: Anna Stuchlik <37244380+annastuchlik@users.noreply.github.com> Closes #11369	2023-03-21 22:45:47 +02:00
Alejo Sanchez	da00052ad8	gms, service: replicate live endpoints on shard 0 Call replicate_live_endpoints on shard 0 to copy from 0 to the rest of the shards. And get the list of live members from shard 0. Move lock to the callers. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13240	2023-03-21 15:46:12 +01:00
Gleb Natapov	fd6d45e178	bootstrapper: Add get_random_bootstrap_tokens function Does the same as get_bootstrap_tokens() but does not consult initial token config option. Will be used later.	2023-03-21 16:06:43 +02:00
Gleb Natapov	fc84c69b7e	service: raft: add support for topology_change command into raft_group0_client Extend raft_group0_client::prepare_command with support of topology_change type of command.	2023-03-21 16:06:43 +02:00
Gleb Natapov	16d61e791f	service: raft: introduce topology_change group0 command Also extend group0_command to be able to send new command type. The command consists of a mutation array.	2023-03-21 16:06:43 +02:00
Gleb Natapov	5e232ebee5	system_keyspace: add a table to persist topology change state machine's state Add local table to store topology change state machine's state there. Also add a function that loads the state to memory.	2023-03-21 16:06:43 +02:00
Gleb Natapov	a2b7d2c1a1	service: Introduce topology state machine data structures The topology state machine will track all the nodes in a cluster, their state, properties (topology, tokens, etc) and requested actions. Node state can be one of those: none - the node is not yet in the cluster bootstrapping - the node is currently bootstrapping decommissioning - the node is being decommissioned removing - the node is being removed replacing - the node is replacing another node normal - the node is working normally rebuild - the node is being rebuilt left - the node is left the cluster Nodes in state left are never removed from the state. Tokens also can be in one of the states: write_both_read_old - writes are going to new and old replica, but reads are from old replicas still write_both_read_new - writes still going to old and new replicas but reads are from new replica owner - tokens are owned by the node and reads and write go to new replica set only Tokens that needs to be move start in 'write_both_read_old' state. After entire cluster learns about it streaming start. After the streaming tokens move to 'write_both_read_new' state and again the whole cluster needs to learn about it and make sure no reads started before that point exist in the system. After that tokens may move to 'owner' state. topology_request is the field through which a topology operation request can be issued to a node. A request is one of the topology operation currently supported: join, leave, replace or remove.	2023-03-21 16:06:43 +02:00
Gleb Natapov	dd1e27736e	storage_proxy: not consult topology on local table write Writes to tables with local replication strategies do not need to consult the topology. This is not only an optimization but it allows writing into the local tables before topology is known.	2023-03-21 16:06:43 +02:00
Anna Stuchlik	922f6ba3dd	doc: fix the service name in upgrade guides Fixes https://github.com/scylladb/scylladb/issues/13207 This commit fixes the service and package names in the upgrade guides 5.0-to-2022.1 and 5.1-to-2022.2. Service name: scylla-server Package name: scylla-enterprise Previous PRs to fix the same issue in other upgrade guides: https://github.com/scylladb/scylladb/pull/12679 https://github.com/scylladb/scylladb/pull/12698 This commit must be backported to branch-5.1 and branch 5.2. Closes #13225	2023-03-21 15:56:28 +02:00
Kefu Chai	124410c059	api: reference httpd::* symbols like 'httpd::' this change is a leftover of `063b3be`, which failed to include the changes in the header files. it turns out we have `using namespace httpd;` in seastar's `request_parser.rl`, and we should not rely on this statement to expose the symbols in `seatar::httpd` to `seastar` namespace. in this change, api/.hh: all httpd symbols are referenced by `httpd::` instead of being referenced as if they are in `seastar`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-21 15:49:10 +02:00
Avi Kivity	19810cfc5e	transport: correctly format unknown opcode gcc allows an enum to contain values outside its members. For extra safety, as this can be user visible, format the unknown opcode and return it.	2023-03-21 15:43:00 +02:00
Avi Kivity	e75009cd49	treewide: catch by reference gcc rightly warns about capturing by value, so capture by reference.	2023-03-21 15:43:00 +02:00
Avi Kivity	eaad38c682	test: raft: avoid confusing string compare gcc doesn't like comparing a C string to an sstring -- apparently it has different promotion rules than clang. Fix by doing an explicit conversion.	2023-03-21 15:43:00 +02:00
Avi Kivity	bdfc0aa748	utils, types, test: extract lexicographical compare utilities UUID_test uses lexicograhical_compare from the types module. This is a layering violation, since UUIDs are at a much lower level than the database type system. In practical terms, this cause link failures with gcc due to some thread-local-storage variables defined in types.hh but not provided by any object, since we don't link with types.o in this test. Fix by extracting the relevant functions into a new header.	2023-03-21 15:42:53 +02:00
Avi Kivity	32a724fada	test: raft: fsm_test: disambiguate raft::configuration construction gcc thinks the constructor call is ambiguous since "{}" can match the default constructor. Fix by making the parameter type explicit. Use "{}" for the constructor call to avoid the most-vexing-parse problem.	2023-03-21 13:45:57 +02:00
Avi Kivity	83e149c341	test: reader_concurrency_semaphore_test: handle all enum values gcc considers values outside the enum class enumeration lists to be valid, so handle them. In this case, we don't think they can happen, so abort.	2023-03-21 13:45:57 +02:00
Avi Kivity	bc0bba10b4	repair: fix signed/unsigned compare Fix the loop induction variable to have the same type as the termination value.	2023-03-21 13:45:49 +02:00
Avi Kivity	94a10ed6ab	repair: fix incorrect signed/unsigned compare A signed/unsigned compare can overflow. Fix by using the safer std::cmp_greater(). The problem is minor as the user is unlikely to send a negative id.	2023-03-21 13:45:34 +02:00
Avi Kivity	a806024e1d	treewide: avoid unused variables in if statements gcc warns about unused variables declared in if statements. Just drop them.	2023-03-21 13:42:49 +02:00
Avi Kivity	9ced89a41c	keys: disambiguate construction from initializer_list<bytes> Some tests initialize via an initializer_list, but gcc finds other valid constructors via vector<managed_bytes>. Disambiguate by adding a constructor that accepts the initializer_list, and forward to the wanted constructor.	2023-03-21 13:42:49 +02:00
Avi Kivity	41a2856f78	cql3: expr: fix serialize_listlike() reference-to-temporary with gcc serialize_listlike() is called with a range of either managed_bytes or managed_bytes_opt. If the former, then iterating and assigning to a loop induction variable of type managed_byted_opt& will bind the reference to a temporary managed_bytes_opt, which gcc dislikes. Fix by performing the binding in a separate statement, which allows for lifetime extension.	2023-03-21 13:42:49 +02:00
Avi Kivity	32cc975b2f	compaction: error on invalid scrub type gcc allows an enum to contain a value outside its enum set, so we need to handle it. Since it shouldn't happen, signal an internal error.	2023-03-21 13:42:49 +02:00
Avi Kivity	7bb717d2f9	treewide: prevent redefining names gcc dislikes a member name that matches a type name, as it changes the type name retroactively. Fix by fully-qualifying the type name, so it is not changed by the newly-introduced member.	2023-03-21 13:42:49 +02:00
Avi Kivity	7ab65379b9	api: task_manager: fix signed/unsigned compare Trivial fix by changing the type of the induction variable.	2023-03-21 13:42:42 +02:00
Avi Kivity	429650e508	alternator: streams: fix signed/unsigned comparison We compare a signed variable to an unsigned one, which can yield surprising results. In this case, it is harmless since we already validated the signed input is positive, but use std::cmp_less() to quench any doubts (and warnings).	2023-03-21 13:41:53 +02:00
Nadav Har'El	77bf90bf7d	Merge 'Sanitize {format_types\|version_types} to/from string converters' from Pavel Emelyanov There's a need to convert both -- version and format -- to string and back. Currently, there's a disperse set of helpers in sstables/ code doing that and this PR brings some other to it - adds fmt::formatter<> specialization for both types - leaves one set of {format\|version}_from_string() helpers converting any string-ish object into value refs: #12523 Closes #13214 * github.com:scylladb/scylladb: sstables: Expell sstable_version_types from_string() helper sstables: Generalize ..._from_string helpers sstables: Implement fmt::formatter<sstable_format_types> sstables: Implement fmt::formatter<sstable_version_types> sstables: Move format maps to namespace scope	2023-03-21 13:39:24 +02:00
Avi Kivity	0770b328c7	test: fix some mismatched signed/unsigned comparisons gcc likes to complain about sized/unsigned compares as they can yield surprising results. The fixes are trivial, so apply them.	2023-03-21 13:15:12 +02:00
Pavel Emelyanov	970fc80ea6	feature_service: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-21 11:59:37 +03:00
Pavel Emelyanov	8600cb2db0	feature_service: Move async context into enable() Callers don't need to know that enabling features has this requirement Indentation is deliberately left broken (until next patch) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-21 11:59:34 +03:00
Pavel Emelyanov	ae6e29a919	system_keyspace: Refactor local features load/save helpers Introduce load_local_enabled_features() and save_local_enabled_features() that get and put std::set<sstring> with feature names (and perform set to string and back conversions on their own). They look natural next to existing sys.ks. methods to get/set local-supported features and peer features. Using the new API, the more generic functions to preserve individual features and load them on startup can become much shorter and cleaner. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-21 11:54:02 +03:00
Wojciech Mitros	406ea34aba	build: add wasm compilation target for rust In the future, when testing WASM UDFs, we will only store the Rust source codes of them, and compile them to WASM. To be able to do that, we need rust standard library for the wasm32-wasi target, which is available as an RPM called rust-std-static-wasm32-wasi. Closes #12896 [avi: regenerate toolchain] Closes #13258	2023-03-21 10:30:08 +02:00
Pavel Emelyanov	6a5ab87441	feature_service: Mark supported_feature_set() const It's indeed such Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-21 11:12:29 +03:00
Pavel Emelyanov	985fbf703a	feature_service: Remove single feature enabling method No longer used Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-21 11:12:28 +03:00
Pavel Emelyanov	b27d2c9399	boot: Enable features in batch On boot main calls enable_features_on_startup() which at the end scans through the list of features and enables them. Same as in previous patch -- it makes sense to use batch enabling here. Note, that despite the loop that collects features is not as trivial as in previous patch (gossiper case), it still operates with local copies of feature sets so delaying the feature's enabling doesn't affect other features' need to be enabled too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-21 11:12:25 +03:00
Pavel Emelyanov	256dd9d7e3	gossiper: Enable features in batch Gossiper code walks the list of feature names and enables them one-by-one. However, in the feature_service code there's a method that enables features in batch. Using it now doesn't make any difference, but next patches will make some use of it. Also, this will let shortening feature_service's API and will make it simpler to remove qctx thing from there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-21 11:12:16 +03:00
Pavel Emelyanov	fe7609865d	Merge 'reader_concurrency_semaphore: improve diagnostics printout' from Botond Dénes Remove redundant "Total: ..." line. Include the entire `reader_concurrency_semaphore::stats` in the printout. This includes a lot of metrics not exported to monitoring. These metrics are very valuable when debugging timeouts but are otherwise uninteresting. To avoid bloating our monitoring with such niche metrics, we dump them when they are interesting: when timeouts happen. To be really helpful, we do need historic values too, but this shouldn't be a problem: timeouts come in bursts, we usually get at least a handful of diagnostics dumps at a time. New stats are also added to record the reason why reads are queued on the semaphore. Printout before: ``` INFO 2023-03-14 12:43:54,496 [shard 0] reader_concurrency_semaphore - Semaphore test_reader_concurrency_semaphore_memory_limit_no_leaks with 4/4 count and 7168/4096 memory resources: kill limit triggered, dumping permit diagnostics: permits count memory table/description/state 4 4 7K ./reader/active/unused 2 0 0B ./reader/waiting_for_admission 6 4 7K total Total: 6 permits with 4 count and 7K memory resources ``` Printout after: ``` INFO 2023-03-16 04:23:41,791 [shard 0] reader_concurrency_semaphore - Semaphore test_reader_concurrency_semaphore_memory_limit_no_leaks with 3/4 count and 7168/4096 memory resources: kill limit triggered, dumping permit diagnostics: permits count memory table/description/state 2 2 6K ./reader/active/unused 1 1 1K ./reader/waiting_for_memory 2 0 0B ./reader/waiting_for_admission 5 3 7K total Stats: permit_based_evictions: 0 time_based_evictions: 0 inactive_reads: 0 total_successful_reads: 0 total_failed_reads: 0 total_reads_shed_due_to_overload: 0 total_reads_killed_due_to_kill_limit: 1 reads_admitted: 4 reads_enqueued_for_admission: 4 reads_enqueued_for_memory: 5 reads_admitted_immediately: 2 reads_queued_because_ready_list: 0 reads_queued_because_used_permits: 0 reads_queued_because_memory_resources: 0 reads_queued_because_count_resources: 4 reads_queued_with_eviction: 0 total_permits: 6 current_permits: 5 used_permits: 0 blocked_permits: 0 disk_reads: 0 sstables_read: 0 ``` Closes #13173 * github.com:scylladb/scylladb: test/boost/reader_concurrency_semaphore_test: remove redundant stats printouts reader_concurrency_semaphore: do_dump_reader_permit_diagnostics(): print the stats reader_concurrency_semaphore: add stats to record reason for queueing permits reader_concurrency_semaphore: can_admit_read(): also return reason for rejection	2023-03-21 10:41:11 +03:00
Pavel Emelyanov	eecb9244dd	sstables: Expell sstable_version_types from_string() helper It's name is too generic despite it's narrow specialization. Also, there's a version_from_string() method that does the same in a more convenient way. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-21 09:56:18 +03:00
Pavel Emelyanov	4e99637777	sstables: Generalize ..._from_string helpers There are two string->{version\|format} converters living on class sstable. It's better to have both in namespace scope. Surprisingly, there's only one caller of it. Also this patch makes both accept std::string_view not to limit the helpers in converting only sstring&-s. This changes calls for reverse_map template update with "heterogenuous lookup". Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-21 09:56:18 +03:00
Pavel Emelyanov	bb59dc2ec1	sstables: Implement fmt::formatter<sstable_format_types> Same as in previous patch for another enum-class type. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-21 09:56:18 +03:00
Pavel Emelyanov	6b04eb74d6	sstables: Implement fmt::formatter<sstable_version_types> This way the version type can be fed as-is into fmt:: code, respectively the conversion to string is as simple as fmt::to_string(v). So also drop the explicit existing to_string() helper updating all callers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-21 09:56:18 +03:00
Pavel Emelyanov	ea1c6fbf98	sstables: Move format maps to namespace scope They will be used by fmt::formatter specification for version and format types in next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-21 09:56:18 +03:00
Nadav Har'El	511308bccf	test/cql-pytest: tests for single-element multi-column restrictions It turns out that Cassandra handles a restriction like `(c2) = (1)` just like `c2 = 1`, and is not limited like multi-column restrictions. In particular, this query works despite missing "c1", and may also use an index if c2 is indexed. But currently in Scylla, `(c2) = (1)` is handled like a multi-column restriction, so complains if c2 is not the first clustering key column, and cannot use an index. This patch adds several tests demonstrating this difference between Scylla and Cassandra (#13250). The xfailing tests pass on Cassandra but fail on Scylla. Refs #13250 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13252	2023-03-21 07:56:24 +02:00
Anna Stuchlik	26bb36cdf5	doc: related https://github.com/scylladb/scylladb/issues/12754 ; add the missing information about reporting latencies to the upgrade guide 5.1 to 5.2 Closes #12935	2023-03-21 07:17:07 +02:00
Kefu Chai	faa47e9624	mutation: drop operator<<(ostream, const range_tombstone{_change,} &) as all of its callers have been removed, let's drop these two operators. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-21 11:37:07 +08:00
Kefu Chai	d146535ec6	mutation: use fmtlib to print range_stombstone{_change,} prepare for removing `operator<<(std::ostream&, const range_tombstone&)` and `operator<<(std::ostream& out, const range_tombstone_change&)`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-21 11:37:07 +08:00
Kefu Chai	755aea8e7f	mutation: mutation_fragment_v2: specialize fmt::formatter<range_tombstone_change> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print range_tombstone_change without using ostream<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-21 11:37:07 +08:00
Kefu Chai	4af0a0ed19	mutation: range_tombstone: specialize fmt::formatter<range_tombstone> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print range_tombstone. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-21 11:37:07 +08:00
Daniel Reis	3d1c78bdcc	fix: links to php driver	2023-03-20 15:28:00 -03:00
Daniel Reis	f83f844319	fix: adding php versions into driver's description	2023-03-20 15:25:52 -03:00
Kefu Chai	b11fd28a46	dist/redhat: split Requires section into multiple lines for better readability Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-20 22:25:24 +08:00
Kefu Chai	7165551fd7	dist/redhat: enforce dependency on %{release} also s/%{version}/%{version}-%{release}/ in `Requires:` sections. this enforces the runtime dependencies of exactly the same releases between scylla packages. Fixes #13222 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-20 22:25:24 +08:00
Avi Kivity	0f97d464d3	Merge 'cql: check if the function is builtin when granting permissisons' from Wojciech Mitros Currently, when granting a permission on a funciton resource, we only check if the function exists, regardless of whether it's a user or a builtin function. We should not support altering permissions on builtin functions, so this patch adds a check for confirming that the found function is not builtin. Additionally, adjust an error exception thrown when trying to alter a permission that does not apply on a given resource Closes #13184 * github.com:scylladb/scylladb: cql: change exception type when granting incorrect permissions cql: check if the function is builtin when granting permissisons	2023-03-20 16:17:02 +02:00
Kefu Chai	476bd84dd0	config: add a space before parameter for better consistency in the code formatting. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13248	2023-03-20 16:03:00 +02:00
Botond Dénes	bf8b746bca	Merge 'utils: UUID: specialize fmt::formatter for UUID and tagged_uuid<>' from Kefu Chai this is a part of a series migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print UUID without using ostream<<. also, this change re-implements some formatting helpers using fmtlib for better performance and less dependencies on operator<<(), but we cannot drop it at this moment, as quite a few caller sites are still using operator<<(ostream&, const UUID&) and operator<<(ostream&, tagged_uuid<T>&). we will address them separately. * add `fmt::formatter<UUID>` * add `fmt::formatter<tagged_uuid<T>>` * implement `UUID::to_string()` using `fmt::to_string()` * implement `operator<<(std::ostream&, const UUID&)` with `fmt::print()`, this should help to improve the performance when printing uuid, as `fmt::print()` does not materialize a string when printing the uuid. * treewide: use fmtlib when printing UUID Refs #13245 Closes #13246 * github.com:scylladb/scylladb: treewide: use fmtlib when printing UUID utils: UUID: specialize fmt::formatter for UUID and tagged_uuid<>	2023-03-20 14:26:11 +02:00
Gleb Natapov	34d41177fe	storage_service: pass storage_proxy and system_distributed_keyspace objects to messaging initialization Will be needed there later. Message-Id: <20230316112801.1004602-14-gleb@scylladb.com>	2023-03-20 11:58:50 +01:00
Gleb Natapov	d8edd2055f	service: raft: add several accessors to group0 class They will be used by later patches. Message-Id: <20230316112801.1004602-13-gleb@scylladb.com>	2023-03-20 11:57:18 +01:00
Gleb Natapov	7d535a84bb	servers: raft: make remove_from_raft_config public Will be used by later patches. Message-Id: <20230316112801.1004602-11-gleb@scylladb.com>	2023-03-20 11:47:55 +01:00
Gleb Natapov	f017aa1ad3	service: raft: pass storage service to group0_state_machine To apply topology_change commands group0_state_machine needs to have an access to the storage service to support topology changes over raft. Message-Id: <20230316112801.1004602-10-gleb@scylladb.com>	2023-03-20 11:45:57 +01:00
Gleb Natapov	a690070722	raft_sys_table_storage: give initial snapshot a non zero value We create a snapshot (config only, but still), but do not assign it any id. Because of that it is not loaded on start. We do want it to be loaded though since the state of group0 will not be re-created from the log on restart because the entries will have outdated id and will be skipped. As a result in memory state machine state will not be restored. This is not a problem now since schema state it restored outside of raft code. Message-Id: <20230316112801.1004602-5-gleb@scylladb.com>	2023-03-20 11:45:38 +01:00
Gleb Natapov	2fc8e13dd8	raft: add server::wait_for_state_change() function Add a function that allows waiting for a state change of a raft server. It is useful for a user that wants to know when a node becomes/stops being a leader. Message-Id: <20230316112801.1004602-4-gleb@scylladb.com>	2023-03-20 11:31:55 +01:00
Gleb Natapov	59f7aeb79b	raft: move some functions out of ad-hoc section Make tick() and is_leader() part of the API. First is used externally already and another will be used in following patches. Message-Id: <20230316112801.1004602-3-gleb@scylladb.com>	2023-03-20 11:25:19 +01:00
Nadav Har'El	c550e681d7	test/rest_api: fix flaky test for toppartitions The REST test test_storage_service.py::test_toppartitions_pk_needs_escaping was flaky. It tests the toppartition request, which unfortunately needs to choose a sampling duration in advance, and we chose 1 second which we considered more than enough - and indeed typically even 1ms is enough! but very rarely (only know of only one occurance, in issue #13223) one second is not enough. Instead of increasing this 1 second and making this test even slower, this patch takes a retry approach: The tests starts with a 0.01 second duration, and is then retried with increasing durations until it succeeds or a 5-seconds duration is reached. This retry approach has two benefits: 1. It de-flakes the test (allowing a very slow test to take 5 seconds instead of 1 seconds which wasn't enough), and 2. At the same time it makes a successful test much faster (it used to always take a full second, now it takes 0.07 seconds on a dev build on my laptop). A failed test may, in some cases, take 10 seconds after this patch (although in some other cases, an error will be caught immediately), but I consider this acceptable - this test should pass, after all, and a failure indicates a regression and taking 10 seconds will be the last of our worries in that case. Fixes #13223. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13238	2023-03-20 11:32:53 +02:00
Kefu Chai	0ba6627d5c	wasm: block all signals in alien thread as in main(), we use `stop_signal` to handle SIGINT and SIGTERM, so when scylla receives a SIGTERM, the corresponding signal handler could get called on any threads created by this program. so there is chance that the alien_runner thread could be choosen to run the signal handler setup by `main()`, but that signal handler assumes the availability of Seastar reactor. unfortunately, we don't have a Seastar reactor in alien thread. the same applies to Seastar's `thread_pool` which handles the slow and blocking POSIX calls typically used for interacting with files. so, in this change, we use the same approach as Seastar's `thread_pool::work()` -- just block all signals, so the alien threads used by wasm for compiling UDF won't handle the signals using the handlers planted by `main()`. Fixes #13228 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13233	2023-03-20 11:20:19 +02:00
Avi Kivity	bab29a2f27	Merge 'Unit tests cleanup for sstable generation changes' from Benny Halevy This series cleans up unit test in preparation for PR #12994. Helpers are added (or reused) to not rely on specific sstable generation numbers where possible (other than loading reference sstables that are committed to the repo with given generation numbers), and to generate the sstables for tests easily, taking advantage of generation management in `sstable_test_env`, `table_for_tests`, or `replica::table` itself. Closes #13242 * github.com:scylladb/scylladb: test: add verify_mutation helpers. test: add make_sstable_containing memtable test: table_for_tests: add make_sstable function test: sstable_test_env: add make_sst_factory methods test: sstable_compaction_test: do not rely on specific generations tests: use make_sstable defaults as much as possible test: sstable_test_env: add make_table_for_tests test: sstable_datafile_test: do not rely on sepecific sstable generations test: sstable_test_env: add reusable_sst(shared_sstable) sstable: expose get_storage function test: mutation_reader_test: create_sstable: do not rely on specific generations test: mutation_reader_test: do_test_clustering_order_merger_sstable_set: rely on test_envsstable generation test: mutation_reader_test: combined_mutation_reader_test: define a local sst_factory function test: mutation_reader_test: do not use tmpdir test: use big format by default test: sstable_compaction_test: use highest sstable version by default test: test_env: make_db_config: set cfg host_id test: sstable_datafile_test: fixup indentation test: sstable_datafile_test: various tests: do_with_async test: sstable_3_x_test: validate_read, sstable_assertions: get shared_sstable test: sstable_3_x_test: compare_sstables: get shared_sstable test: sstable_3_x_test: write_sstables: return shared_sstable test: sstable_3_x_test: write, compare, validate_sstables: use env.tempdir test: sstable_3_x_test: compacted_sstable_reader: do not reopen compacted_sst test: lib: test_services: delete now unused stop_and_keep_alive test: sstable_compaction_test: use deferred_stop to stop table_for_tests test: sstable_compaction_test: compound_sstable_set_incremental_selector_test: do_with_async test: sstable_compaction_test: sstable_needs_cleanup_test: do_with_async test: sstable_compaction_test: leveled_05: fixup indentation test: sstable_compaction_test: leveled_05: do_with_async test: sstable_compaction_test: compact_02: do_with_async test: sstable_compaction_test: compact_sstables: simplify variable allocation test: sstable_compaction_test: compact_sstables: reindent test: sstable_compaction_test: compact_sstables: use thread test: sstable_compaction_test: sstable_rewrite: simplify variable allocation test: sstable_compaction_test: sstable_rewrite: fixup indentation test: sstable_compaction_test: sstable_rewrite: do_with_async test: sstable_compaction_test: compact: fixup indentation test: sstable_compaction_test: compact: complete conversion to async thread test: sstable_compaction_test: compaction_manager_basic_test: rename generations to idx	2023-03-20 11:16:46 +02:00
Nadav Har'El	8b0822be77	test/cql-pytest: reproducer for bug crashing Scylla on mismatched tuple This patch addes a reproducing test for issue #13241, where attempting a SELECT restriction (b,c,d) IN ((1,2)) - where the tuple is shorter than needed - crashes Scylla (on segmentation fault) instead of generating a clean error as it should (and as done on Cassandra). The test also demonstractes that if the tuple is longer than needed (instead of shorter), the behavior is correct, and it is also correct if "=" is used instead of IN. Only the combination of IN and too-short tuple seems to be broken - but broken in a bad way (can be used to crash Scylla). Because the test crashes Scylla when fails, it is marked "skip". Refs #13241 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13244	2023-03-20 11:13:02 +02:00
Anna Stuchlik	fc927b1774	doc: add the Enterprise vs. OSS Matrix Fixes https://github.com/scylladb/scylladb/issues/12758 This commit adds a new page with a matrix that shows on which ScyllaDB Open Source versions we based given ScyllaDB Enterprise versions. The new file is added to the newly created Reference section. Closes #13230	2023-03-20 10:18:10 +02:00
Kefu Chai	94c6df0a08	treewide: use fmtlib when printing UUID this change tries to reduce the number of callers using operator<<() for printing UUID. they are found by compiling the tree after commenting out `operator<<(std::ostream& out, const UUID& uuid)`. but this change alone is not enough to drop all callers, as some callers are using `operator<<(ostream&, const unordered_map&)` and other overloads to print ranges whose elements contain UUID. so in order to limit the scope of the change, we are not changing them here. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-20 15:38:45 +08:00
Kefu Chai	c14c70b89d	utils: UUID: specialize fmt::formatter for UUID and tagged_uuid<> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print UUID without using ostream<<. also, this change reimplements some formatting helpers using fmtlib for better performance and less dependencies on operator<<(), but we cannot drop it at this moment, as quite a few caller sites are still using operator<<(ostream&, const UUID&) and operator<<(ostream&, tagged_uuid<T>&). we will address them separately. * add fmt::formatter<UUID> * add fmt::formatter<tagged_uuid<T>> * implement UUID::to_string() using fmt::to_string() * implement operator<<(std::ostream&, const UUID&) with fmt::print(), this should help to improve the performance when printing uuid, as fmt::print() does not materialize a string when printing the uuid. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-20 14:25:45 +08:00
Botond Dénes	583e49dd09	Merge 'cmake: sync with `configure.py` (14/n)' from Kefu Chai this is the 14rd changeset of a series which tries to give an overhaul to the CMake building system. this series has two goals: - to enable developer to use CMake for building scylla. so they can use tools (CLion for instance) with CMake integration for better developer experience - to enable us to tweak the dependencies in a simpler way. a well-defined cross module / subsystem dependency is a prerequisite for building this project with the C++20 modules. this changeset includes following changes: - build: cmake: promote add_scylla_test() to test/ - build: cmake: add all tests Closes #13220 * github.com:scylladb/scylladb: build: cmake: add all tests build: cmake: promote add_scylla_test() to test/	2023-03-20 08:13:07 +02:00
Pavel Emelyanov	c88e47a624	memory_data_sink: Add move ctor To make it possible to move the class member away resetting to be be empty at the same time. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13208	2023-03-20 07:55:20 +02:00
Pavel Emelyanov	b631081df8	test: Fixie for test sstable chdir Some unit tests want to change the sstable::_dir on the fly. However, the sstable::_dir is going away, so it needs a yet another virtual call on storage driver. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13213	2023-03-20 07:28:22 +02:00
Benny Halevy	d62df5cac6	test: add verify_mutation helpers. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 17:48:22 +02:00
Benny Halevy	cf4eaa1fbc	test: add make_sstable_containing memtable Helper for make_sstable + write_memtable_to_sstable_for_test + reusable_sst / load. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 17:48:22 +02:00
Benny Halevy	0ce6afb5f9	test: table_for_tests: add make_sstable function table_for_tests uses a sstables manager to generate sstables and gets the new generation from table.calculate_generation_for_new_table(). The version to use is either the highest supported or an ad-hoc version passed to make_sstable. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 17:48:22 +02:00
Benny Halevy	88d085ea66	test: sstable_test_env: add make_sst_factory methods The tests extensively use a `std::function<shared_sstable()>` to generate new tables. Rather than handcrafting them all over the place, let sstable_test_env return such factory given a schema (and another entry point that also gets a version) and that uses the embedded generation_factory in the test_env to generate new sstables with unique generations. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 17:48:22 +02:00
Benny Halevy	c308ba635b	test: sstable_compaction_test: do not rely on specific generations No need to maintain a static generation numbers in the test. Let the sstable_test_env dispatch sstable generations automatically And use the generated sstable themselves for reference rather than their generation numbers. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 17:47:46 +02:00
Benny Halevy	51b2c38472	tests: use make_sstable defaults as much as possible Add a few goodies to sstable_test_env to extend entry points with default params for make_sstable and reusable_sst. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 17:47:14 +02:00
Benny Halevy	084f4e4fde	test: sstable_test_env: add make_table_for_tests Wrap table_for_tests ctor to pass the env sstables_manager as well as the temporary directory path, as this is the most common use case, and in preparation for adding a make_sstable method in table_for_tests. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 17:47:14 +02:00
Benny Halevy	e9af4e4cd8	test: sstable_datafile_test: do not rely on sepecific sstable generations There is no need to use specific generations in the test, just rely on the ones sstable_test_env generates. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 17:46:47 +02:00
Benny Halevy	94192f0ded	test: sstable_test_env: add reusable_sst(shared_sstable) Allow generating a sstable object from an existing sstable to get the directory, generation, and version from it, rather than passing them to reusable_sst from other sources - since the intention is to get a new sstable object based on an existing sstable that was generated by the test. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 17:20:07 +02:00
Benny Halevy	b11e2c81ae	sstable: expose get_storage function To be used by sstable_test_env to reopen existing sstables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 17:19:12 +02:00
Benny Halevy	e9c3f0e478	test: mutation_reader_test: create_sstable: do not rely on specific generations No need to maintain a static generation numbers in the test. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	648ab706df	test: mutation_reader_test: do_test_clustering_order_merger_sstable_set: rely on test_envsstable generation Rather than maintaining a running generation number, use the default env.make_sstable(s) in sst_factory and collect the expected generations from the resulting shared sstable. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	11595b3024	test: mutation_reader_test: combined_mutation_reader_test: define a local sst_factory function For generating shared_sstables with increasing generations (using the test_env make_sstable generations) and a given level. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	506dc1260f	test: mutation_reader_test: do not use tmpdir Rely on the test_env temporary directory instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	ceb5d4fb47	test: use big format by default No need to pass the big format explicitly as it's set by default by make_sstable and it is never overriden. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	f24b69a6ae	test: sstable_compaction_test: use highest sstable version by default Tests should just generate the highest sstable version available. There is no need to ontinue testing old versions, in particular partially supported ones like "la". Use also the default values for sstable::format_types, buffer_size, etc. if there's no particular need to override them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	df5347fca8	test: test_env: make_db_config: set cfg host_id So we can safely use `me` sstables in sstable_directory_test that validates the sstable host owner. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	8b168869be	test: sstable_datafile_test: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	1fce7c76a5	test: sstable_datafile_test: various tests: do_with_async To simplify further cleanups. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	2954feb734	test: sstable_3_x_test: validate_read, sstable_assertions: get shared_sstable Pass the test-generated shared_sstable to validate_read and then to sstable_assertions so it can be used for make_sstable version and generation params. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	969ec8611e	test: sstable_3_x_test: compare_sstables: get shared_sstable Use the sstable generated by the test to generate the result_filename we want for compare. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	3ba0d1659c	test: sstable_3_x_test: write_sstables: return shared_sstable To be pssed to compare_sstable in the next patch, so it can generate to result filename out of it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	4c842fb0e8	test: sstable_3_x_test: write, compare, validate_sstables: use env.tempdir Do not create a tmpdir every time, just use the one that the sstable test env provides. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	71c0c713ee	test: sstable_3_x_test: compacted_sstable_reader: do not reopen compacted_sst Just use the one we created during compaction for verification so we won't have to rely on a particular generation/version. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	e385575407	test: lib: test_services: delete now unused stop_and_keep_alive Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	0bf60d42aa	test: sstable_compaction_test: use deferred_stop to stop table_for_tests Rather than calling cf.stop_and_keep_alive() before the test exits. since it must be stopped also on failure. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	208726d987	test: sstable_compaction_test: compound_sstable_set_incremental_selector_test: do_with_async Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	9d83a94c28	test: sstable_compaction_test: sstable_needs_cleanup_test: do_with_async Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	d8a354a35e	test: sstable_compaction_test: leveled_05: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	8b8c1c5813	test: sstable_compaction_test: leveled_05: do_with_async Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	d1879a5932	test: sstable_compaction_test: compact_02: do_with_async Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	76799d08d6	test: sstable_compaction_test: compact_sstables: simplify variable allocation No need to use lw_shared all over the place now that the function ises a seastar thread. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	af106684ae	test: sstable_compaction_test: compact_sstables: reindent Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	8de808ff15	test: sstable_compaction_test: compact_sstables: use thread Prepare for using make_sstable_containing in a follow up patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	f4989f2ba5	test: sstable_compaction_test: sstable_rewrite: simplify variable allocation No need to use lw_shared all over the place now that the function ises a seastar thread. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	fb379709cf	test: sstable_compaction_test: sstable_rewrite: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	b27910cff2	test: sstable_compaction_test: sstable_rewrite: do_with_async simplify flow using seastar thread. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	d1a112a156	test: sstable_compaction_test: compact: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	d503eb75f1	test: sstable_compaction_test: compact: complete conversion to async thread We already use test_env::do_with_async in this function but we didn't take full advantage of it to simplify the implementation. Do that before further changes are made. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:53:56 +02:00
Benny Halevy	237c844901	test: sstable_compaction_test: compaction_manager_basic_test: rename generations to idx The function used `calculate_generation_for_new_table` for the sstables generation. The so-called `generations` are just used to generate key indices. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-19 16:52:21 +02:00
Botond Dénes	9859bae54f	Merge 'Ignore no such column family in repair' from Aleksandra Martyniuk While repair requested by user is performed, some tables may be dropped. When the repair proceeds to these tables, it should skip them and continue with others. When no_such_column_family is thrown during user requested repair, it is logged and swallowed. Then the repair continues with the remaining tables. Fixes: #13045 Closes #13068 * github.com:scylladb/scylladb: repair: fix indentation repair: continue user requested repair if no_such_column_family is thrown repair: add find_column_family_if_exists function	2023-03-19 15:16:02 +02:00
Botond Dénes	b1c7538e92	Merge 'Give table a reference to storage_options' from Pavel Emelyanov The `storage_options` describes where sstables should be located. Currently the object reside on keyspace_metadata, but is thus not available at the place it's needed the most -- the `table::make_sstable()` call. This set converts keyspace_metadata::storage_opts to be lw-shared-ptr and shares the ptr with class table. refs: #12523 (detached small change from large PR) Closes #13212 * github.com:scylladb/scylladb: table: Keep storage options lw-shared-ptr keyspace_metadata: Make storage options lw-shared-ptr	2023-03-19 15:16:02 +02:00
Avi Kivity	a7099132cc	scripts/pull_github_pr.sh: optionally authenticate This helps overcome rate limits for unauthenticated requests, preventing maintainers from getting much-needed rest. Closes #13210	2023-03-19 15:16:02 +02:00
Kefu Chai	c5b6c91412	db: data_listener: mark data_listener's dtor virtual Clang-17 warns when we tries to delete a pointer to a class with virtual function(s) but without marking its dtor virtual. in this change, we mark the dtor of the base class of `table_listener` virtual to address the warning. we have another solution though -- to mark `table_listener` `final`. as we don't destruct `table_listener` with a pointer to its base classes. but it'd be much simpler to just mark the dtor virtual of its base class with virtual method(s). it's much idiomatic this way, and less error-prune. this change should silence the warning like: ``` In file included from /home/kefu/dev/scylladb/test/boost/data_listeners_test.cc:9: In file included from /usr/include/boost/test/unit_test.hpp:18: In file included from /usr/include/boost/test/test_tools.hpp:46: In file included from /usr/include/boost/test/tools/old/impl.hpp:20: In file included from /usr/include/boost/test/tools/assertion_result.hpp:21: In file included from /usr/include/boost/shared_ptr.hpp:17: In file included from /usr/include/boost/smart_ptr/shared_ptr.hpp:17: In file included from /usr/include/boost/smart_ptr/detail/shared_count.hpp:27: In file included from /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:35: In file included from /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/memory:78: /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:100:2: error: delete called on non-final 'table_listener' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor] delete __ptr; ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:405:4: note: in instantiation of member function 'std::default_delete<table_listener>::operator()' requested here get_deleter()(std::move(__ptr)); ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/stl_construct.h:88:15: note: in instantiation of member function 'std::unique_ptr<table_listener>::~unique_ptr' requested here __location->~_Tp(); ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13198	2023-03-19 15:16:02 +02:00
Kefu Chai	a01eb593ec	test: sstables: do not compare a mutation with an optional<mutation> this change should address the FTBFS with Clang-17. turns out we are comparing a mutation with an optimized_optional<mutation>. and Clang-17 does not want to convert the LHS, which is a mutation to optimized_optional<mutation> for performing the comparison using operator==(const optimized_optional<mutation>&), desipte that optimized_optional(const T& obj) is not marked explicit. this is understandable. so, in this change, instead of relying on the implicit conversion, we just * check if the optional actually holds a value * and compare the value by deferencing the optional. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13196	2023-03-19 15:16:02 +02:00
Pavel Emelyanov	be548a4da3	install-dependencies: Add rapid XML dev package It will be needed by S3 driver to parse multipart-upload messages from server refs: #12523 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13158 [avi: regenerate toolchain] Closes #13192	2023-03-19 15:16:02 +02:00
Avi Kivity	c3a2ec9d3c	Merge 'use fmt::join() for printing ranges' from Kefu Chai this series intends to deprecate `::join()`, as it always materializes a range into a concrete string. but what we always want is to print the elements in the given range to stream, or to a seastar logger, which is backed by fmtlib. also, because fmtlib offers exactly the same set of features implemented by to_string.hh, this change would allow us to use fmtlib to replace to_string.hh for better maintainability, and potentially better performance. as fmtlib is lazy evaluated, and claims to be performant under most circumstances. Closes #13163 * github.com:scylladb/scylladb: utils: to_string: move join to namespace utils treewide: use fmt::join() when appropriate row_cache: pass "const cache_entry" to operator<<	2023-03-19 15:16:02 +02:00
Wojciech Mitros	3cdaf72065	docs: fix minor issues found in the wasm documentation Even after last fixups, the documentation still had some issues with compilation instructions in particular. I also ran a spelling and grammar check on the text, and fixed issues found by it. Closes #13206	2023-03-19 15:16:02 +02:00
Botond Dénes	6a8fbbebf2	test/boost/reader_concurrency_semaphore_test: remove redundant stats printouts The semaphore stats are now included in the standard semaphore diagnostics printout, no need to dump separately.	2023-03-17 03:15:41 -04:00
Botond Dénes	d6583cad0a	reader_concurrency_semaphore: do_dump_reader_permit_diagnostics(): print the stats Print the semaphore stats below the permit listing and remove the currently redundant "Total: " line. Some of the stats printed here are already exported as metrics, but instead of trying to cherry-pick and risk some metrics falling through the cracks, just print everything, there aren't that many anyway.	2023-03-17 03:15:41 -04:00
Botond Dénes	7b701ac52e	reader_concurrency_semaphore: add stats to record reason for queueing permits When diagnosing problems, knowing why permits were queued is very valuable. Record the reason in a new stats, one for each reason a permit can be queued.	2023-03-17 03:15:41 -04:00
Botond Dénes	bb00405818	reader_concurrency_semaphore: can_admit_read(): also return reason for rejection So caller can bump the appropriate counters or log the reason why the the request cannot be admitted.	2023-03-17 03:15:40 -04:00
Kefu Chai	f113dac5bf	build: cmake: add all tests * add a new test KIND "UNIT", which provides its own main() * add all tests which were not included yet Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-17 12:56:09 +08:00
Kefu Chai	b440417527	build: cmake: promote add_scylla_test() to test/ as it will be used by test/manual/CMakeLists.txt also. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-17 12:56:09 +08:00
Daniel Reis	86a4b8a57d	docs: scylladb better php driver	2023-03-16 17:00:22 -03:00
Wojciech Mitros	53af79442d	cql: change exception type when granting incorrect permissions For compatibility with Cassandra, this patch changes the exception type thrown when trying to alter a permission that is not applicable on the given resource from an Invalid query to a Syntax exception.	2023-03-16 16:43:37 +01:00
Wojciech Mitros	9c36c0313a	cql: check if the function is builtin when granting permissisons Currently, when granting a permission on a funciton resource, we only check if the function exists, regardless of whether it's a user or a builtin function. We should not support altering permissions on builtin functions, so this patch adds a check for confirming that the found function is not builtin.	2023-03-16 16:43:32 +01:00
Pavel Emelyanov	e882269d93	table: Keep storage options lw-shared-ptr Tables need to know which storage their sstables need to be located at, so class table needs to have itw reference of the storage options. The thing can be inherited from the keyspace metadata. Tests sometimes create table without keyspace at hand. For those use default-initialized storage options (which is local storage). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-16 17:30:45 +03:00
Pavel Emelyanov	c619a53c61	keyspace_metadata: Make storage options lw-shared-ptr Today the storage options are embedded into metadata object. In the future the storage options will need to be somehow referenced by the class table too. Using plan reference doesn't look safe, turn the storage options into lw-shared-ptr instead. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-16 17:30:45 +03:00
Kefu Chai	93fa70069c	utils: to_string: move join to namespace utils `join` can easily be confused with boost::algorithm::join so make it more visible that we're using scylla's utils implementation. Also, move `struct print_with_comma` to utils::internal. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-16 20:34:18 +08:00
Kefu Chai	c37f4e5252	treewide: use fmt::join() when appropriate now that fmtlib provides fmt::join(). see https://fmt.dev/latest/api.html#_CPPv4I0EN3fmt4joinE9join_viewIN6detail10iterator_tI5RangeEEN6detail10sentinel_tI5RangeEEERR5Range11string_view there is not need to revent the wheel. so in this change, the homebrew join() is replaced with fmt::join(). as fmt::join() returns an join_view(), this could improve the performance under certain circumstances where the fully materialized string is not needed. please note, the goal of this change is to use fmt::join(), and this change does not intend to improve the performance of existing implementation based on "operator<<" unless the new implementation is much more complicated. we will address the unnecessarily materialized strings in a follow-up commit. some noteworthy things related to this change: * unlike the existing `join()`, `fmt::join()` returns a view. so we have to materialize the view if what we expect is a `sstring` * `fmt::format()` does not accept a view, so we cannot pass the return value of `fmt::join()` to `fmt::format()` * fmtlib does not format a typed pointer, i.e., it does not format, for instance, a `const std::string`. but operator<<() always print a typed pointer. so if we want to format a typed pointer, we either need to cast the pointer to `void` or use `fmt::ptr()`. * fmtlib is not able to pick up the overload of `operator<<(std::ostream& os, const column_definition* cd)`, so we have to use a wrapper class of `maybe_column_definition` for printing a pointer to `column_definition`. since the overload is only used by the two overloads of `statement_restrictions::add_single_column_parition_key_restriction()`, the operator<< for `const column_definition*` is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-16 20:34:18 +08:00
Wojciech Mitros	aad2afd417	rust: update dependencies Cranelift-codegen 0.92.0 and wasmtime 5.0.0 have security issues potentially allowing malicious UDFs to read some memory outside the wasm sandbox. This patch updates them to versions 0.92.1 and 5.0.1 respectively, where the issues are fixed. Fixes #13157 Closes #13171	2023-03-16 13:45:53 +02:00
Takuya ASADA	a79604b0d6	create-relocatable-package.py: exclude tools/cqlsh We should exclude tools/cqlsh for relocatable package. fixes #13181 Closes #13183	2023-03-16 13:37:16 +02:00
Anna Stuchlik	d00926a517	doc: Add version 5.2 to the version selector This commit adds branch-5.2 to the list of branches for which we want to build the docs. As a result, version 5.2 will be added to the version selector. NOTE: Version 5.2 will be marked as unstable and an appropriate message will be shown to the user. After 5.2 is released, branch-5.2 needs to be moved from UNSTABLE_VERSIONS to LATEST_VERSION (where is should replace branch-5.1) Closes #13200	2023-03-16 10:46:30 +02:00
Kamil Braun	b919373cce	Merge 'api: gossiper: get alive nodes after reaching current shard 0 version' from Alecco Add an API call to wait for all shards to reach the current shard 0 gossiper version. Throws when timeout is reached. Closes #12540 * github.com:scylladb/scylladb: api: gossiper: fix alive nodes gms, service: lock live endpoint copy gms, service: live endpoint copy method	2023-03-16 09:46:02 +01:00
Botond Dénes	b31a55af7e	Merge 'cmake: sync with `configure.py` (13/n)' from Kefu Chai this is the 13rd changeset of a series which tries to give an overhaul to the CMake building system. this series has two goals: - to enable developer to use CMake for building scylla. so they can use tools (CLion for instance) with CMake integration for better developer experience - to enable us to tweak the dependencies in a simpler way. a well-defined cross module / subsystem dependency is a prerequisite for building this project with the C++20 modules. this changeset includes following changes: - build: cmake: increase per link job mem to 4GiB - build: cmake: add missing sources to test-lib - build: cmake: add more tests - build: cmake: remote quotes in "include()" commands - build: cmake: drop unnecessary linkages Closes #13199 * github.com:scylladb/scylladb: build: cmake: drop unnecessary linkages build: cmake: remote quotes in "include()" commands build: cmake: add more tests build: cmake: add missing sources to test-lib build: cmake: increase per link job mem to 4GiB	2023-03-16 10:40:18 +02:00
Nadav Har'El	c5195e0acd	cql-pytest: add reproducers for GROUP BY bugs The translated Cassandra unit tests in cassandra_tests/validation/operations/ reproduced three bugs in GROUP BY's interaction with LIMIT and PER PARTITION LIMIT - issue #5361, #5362 and #5363. Unfortunately, those test functions are very long, and each test fails on all of these issues and a few more, making it difficult to use these tests to verify when those tests have been fixed. In other words, ideally a patch for issue 5361 should un-xfail some reproducing test for this issue - but all the existing tests will continue to fail after fixing 5361, because of other remaining bugs. So in this patch, I created a new test file test_group_by.py with my own tests for the GROUP BY feature. I tried to explore the different capabilities of the GROUP BY feature, its different success and error paths, and how GROUP BY interacts with LIMIT and PER PARTITION LIMIT. As usual, I created many small test functions and not one huge test function, and as a result we now have 5 xfailing tests which each reproduces one bug and when the bug is fixed, it will start to pass. All tests added here pass on Cassandra. Refs #5361 Refs #5362 Refs #5363 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13136	2023-03-16 10:39:05 +02:00
Botond Dénes	f4b5679804	Merge 'doc: Updates the recommended OS to be Ubuntu 22.04' from Anna Stuchlik Fixes https://github.com/scylladb/scylladb/issues/13138 Fixes https://github.com/scylladb/scylladb/issues/13153 This PR: - Fixes outdated information about the recommended OS. Since version 5.2, the recommended OS should be Ubuntu 22.04 because that OS is used for building the ScyllaDB image. - Adds the OS support information for version 5.2. This PR (both commits) needs to be backported to branch-5.2. Closes #13188 * github.com:scylladb/scylladb: doc: Add OS support for version 5.2 doc: Updates the recommended OS to be Ubuntu 22.04	2023-03-16 08:05:19 +02:00
Kefu Chai	0069b43fd4	build: cmake: drop unnecessary linkages most of the linked libraries should be pulled in by the targets defined by subsystems. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-16 12:14:21 +08:00
Kefu Chai	681dfac496	build: cmake: remote quotes in "include()" commands more consistent this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-16 12:14:21 +08:00
Kefu Chai	03f5f788a3	build: cmake: add more tests all tests under test/boost are now buildable. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-16 12:14:21 +08:00
Kefu Chai	649a31a722	build: cmake: add missing sources to test-lib Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-16 12:14:21 +08:00
Kefu Chai	8963fe4e41	build: cmake: increase per link job mem to 4GiB lld is multi-threaded in some phases, based on observation, it could spawn up to 16 threads for each link job. and each job could take up to more than 3 GiB memory in total. without the change, we can run into OOM with a machine without abundant memory, so increase the per-link-job mem accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-16 12:14:21 +08:00
Kefu Chai	9eb2626fec	row_cache: pass "const cache_entry" to operator<< operator<<(..) does not mutate the cache_entry parameter passed to it. also, without this change fmtlib is not able to format given cache_entry parameter, as the caller formatter has "const" specifier. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-16 07:46:11 +08:00
Avi Kivity	7a5e609d8d	cql3: functions: add helpers for automating marshalling for scalar functions Add a helper that, given a C++ function, deduces its arguument types and wraps the function in marshalling/unmarshalling code. The native function expects non-null inputs, so an additional helper is called to decide what to do if nulls are encountered. One such helper is return_accumulator_on_null (since that's the default behavior of aggregates), and the other is return_any_nonnull(), useful for reductions.	2023-03-15 22:28:41 +02:00
Avi Kivity	35dd3edb9e	types: fix big_decimal constructor from literal 0 Currently, big_decimal(0) will select the big_decimal(string_view) constructor (via 0 -> const char* -> string_view conversions). 0 is important for initializing aggregates, so fix it ahead of using it.	2023-03-15 22:24:12 +02:00
Avi Kivity	6c8d942fa1	cql3: functions: add helper class for internal scalar functions We'll need many scalar functions to implement aggregates in terms of scalars, so we add an internal_scalar_function class to reduce boilerplate. The new class proxies the scalar function into a native noncopyable_function provided by the constructor.	2023-03-15 22:22:02 +02:00
Avi Kivity	26e8ec663b	db: functions: add stateless aggregate functions Currently, aggregate functions are implemented in a statefull manner. The accumulator is stored internally in an aggregate_function::aggregate, requiring each query to instantiate new instances (see aggregate_function_selector's constructor, and note how it's called from selector::new_instance()). This makes aggregates hard to use in expressions, since expressions are stateless (with state only provided to evaluate()). To facilitate migration towards stateless expressions, we define a stateless_aggregate_function (modelled after user-defined aggregates, which are already stateless). This new struct defines the aggregate in terms of three scalar functions: one to aggregate a new input into an accumulator (provided in the first parameter), one to finalize an accumulator into a result, and one to reduce two accumulators for parallelized aggregation. An adapter of the new struct to the aggregate_function interface is also provided, to allow for incremental migration in the following patches.	2023-03-15 22:10:23 +02:00
Avi Kivity	82c4341e0e	db, cql3: move scalar_function from cql3/functions to db/functions Previously, we moved cql3::functions::function to the db::functions namespace, since functions are a part of the data dictionary, which is independent of cql3. We do the same now for scalar_function, since we wish to make use of it in a new db::functions::stateless_aggregate_function. A stub remains in cql3/functions to avoid churn.	2023-03-15 20:37:25 +02:00
Avi Kivity	29a2788b2e	Merge 'reader_concurrency_semaphore: handle read blocked on memory being registered as inactive' from Botond Dénes A read that requested memory and has to wait for it can be registered as inactive. This can happen for example if the memory request originated from a background I/O operation (a read-ahead maybe). Handling this case is currently very difficult. What we want to do is evict such a read on-the-spot: the fact that there is a read waiting on memory means memory is in demand and so inactive reads should be evicted. To evict this reader, we'd first have to remove it from the memory wait list, which is almost impossible currently, because `expiring_fifo<>`, the type used for the wait list, doesn't allow for that. So in this PR we set out to make this possible first, by transforming all current queues to be intrusive lists of permits. Permits are already linked into an intrusive list, to allow for enumerating all existing permits. We use these existing hooks to link the permits into the appropriate queue, and back to `_permit_list` when they are not in any special queue. To make this possible we first have to make all lists store naked permits, moving all auxiliary data fields currently stored in wrappers like `entry` into the permit itself. With this, all queues and lists in the semaphore are intrusive lists, storing permits directly, which has the following implications: * queues no longer take extra memory, as all of them are intrusive * permits are completely self-sufficient w.r.t to queuing: code can queue or dequeue permits just with a reference to a permit at hand, no other wrapper, iterator, pointer, etc. is necessary. * queues don't keep permits alive anymore; destroying a permit will automatically unlink it from the respective queue, although this might lead to use-after-free. Not a problem in practice, only one code-path (`reader_concurrenc_semaphore::with_permit()`) had to be adjusted. After all that extensive preparations, we can now handle the case of evicting a reader which is queued on memory. Fixes: #12700 Closes #12777 * github.com:scylladb/scylladb: reader_concurrency_semaphore: handle reader blocked on memory becoming inactive reader_concurrency_semaphore: move _permit_list next to the other lists reader_permit: evict inactive read on timeout reader_concurrency_semaphore: move inactive_read to .cc reader_concurrency_semaphore: store permits in _inactive_reads reader_concurrency_semaphore: inactive_read: de-inline more methods reader_concurrency_semaphore: make _ready_list intrusive reader_permit: add wait_for_execution state reader_concurrency_semaphore: make wait lists intrusive reader_concurrency_semaphore: move most wait_queue methods out-of-line reader_concurrency_semaphore: store permits directly in queues reader_permit: introduce (private) operator * and -> reader_concurrency_semaphore: remove redundant waiters() member reader_concurrency_semaphore: add waiters counter reader_permit: use check_abort() for timeout reader_concurrency_semaphore: maybe_dump_permit_diagnostics(): remove permit list param reader_concurrency_semaphroe: make foreach_permit() const reader_permit: add get_schema() and get_op_name() accessors reader_concurrency_semaphore: mark maybe_dump_permit_diagnostics as noexcept	2023-03-15 20:10:19 +02:00
Wojciech Mitros	b776cb4b41	docs: fix typos in wasm documentation This patch fixes 2 small issues with the Wasm UDF documentation that recently got uploaded: 1. a link was unnecessarily wrapped in angle brackets 2. a link did not redirect to the correct page due to a missing ":doc:" tag Closes #13193	2023-03-15 18:48:48 +02:00
Anna Stuchlik	3ad3259396	doc: Add OS support for version 5.2 Fixes https://github.com/scylladb/scylladb/issues/13153 This commit adds a row for version 5.2 to the table of supported platforms.	2023-03-15 16:12:41 +01:00
Kamil Braun	5705df77a1	Merge 'Refactor schema, introduce schema_static_props and move several properties into it' from Gusev Petr Our end goal (#12642) is to mark raft tables to use schema commitlog. There are two similar cases in code right now - `with_null_sharder` and `set_wait_for_sync_to_commitlog` `schema_builder` methods. The problem is that if we need to mark some new schema with one of these methods we need to do this twice - first in a method describing the schema (e.g. `system_keyspace::raft()`) and second in the function `create_table_from_mutations`, which is not obvious and easy to forget. `create_table_from_mutations` is called when schema object is reconstructed from mutations, `with_null_sharder` and `set_wait_for_sync_to_commitlog` must be called from it since the schema properties they describe are not included in the mutation representation of the schema. This series proposes to distinguish between the schema properties that get into mutations and those that do not. The former are described with `schema_builder`, while for the latter we introduce `schema_static_props` struct and the `schema_builder::register_static_configurator` method. This way we can formulate a rule once in the code about which schemas should have a null sharder/be synced, and it will be enforced in all cases. Closes #13170 * github.com:scylladb/scylladb: schema.hh: choose schema_commitlog based on schema_static_props flag schema.hh: use schema_static_props for wait_for_sync_to_commitlog schema.hh: introduce schema_static_props, use it for null_sharder database.cc: drop ensure_populated and mark_as_populated	2023-03-15 15:43:49 +01:00
Kefu Chai	e21926f602	flat_mutation_reader_v2: use maybe_yield() when appropriate just came across this part of code, as `maybe_yield()` is a wrapper around "if should_yield(): yield()", so better off using it for more concise code. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13107	2023-03-15 15:58:55 +02:00
Anna Stuchlik	1bb11126d7	doc: Updates the recommended OS to be Ubuntu 22.04 Fixes https://github.com/scylladb/scylladb/issues/13138 This PR fixes the outdated information about the recommended OS. Since version 5.2, the recommended OS should be Ubuntu 22.04 because that OS is used for building the ScyllaDB image. This commit needs to be backported to branch-5.2.	2023-03-15 13:42:37 +01:00
Pavel Emelyanov	47cdd31f27	main: Forget the --max-io-requests option On start scylla checks if the option is set. It's nowadays useless, as it had been removed from seastar (see `9e34779c` update) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13148	2023-03-15 12:42:06 +02:00
Botond Dénes	e5f3f4b0d1	Merge 'cmake: sync with `configure.py` (12/n)' from Kefu Chai this is the 12nd changeset of a series which tries to give an overhaul to the CMake building system. this series has two goals: - to enable developer to use CMake for building scylla. so they can use tools (CLion for instance) with CMake integration for better developer experience - to enable us to tweak the dependencies in a simpler way. a well-defined cross module / subsystem dependency is a prerequisite for building this project with the C++20 modules. this changeset includes following changes: - build: cmake: remove Seastar from the option name - build: cmake: add missing sources in test-lib and utils - build: cmake: do not include main.cc in scylla-main - build: cmake: define SEASTAR_TESTING_MAIN for SEASTAR tests - build: cmake: add more tests Closes #13180 * github.com:scylladb/scylladb: build: cmake: add more tests build: cmake: define SEASTAR_TESTING_MAIN for SEASTAR tests build: cmake: do not include main.cc in scylla-main build: cmake: add missing sources in test-lib and utils build: cmake: remove Seastar from the option name	2023-03-15 12:40:51 +02:00
Nadav Har'El	543d4ed726	cql-pytest: translate Cassandra's tests for GROUP BY This is a translation of Cassandra's CQL unit test source file validation/operations/SelectGroupByTest.java into our cql-pytest framework. This test file contains only 8 separate test functions, but each of them is very long checking hundreds of different combinations of GROUP BY with other things like LIMIT, ORDER BY, etc., so 6 out of the 7 tests fail on Scylla on one of the bugs listed below - most of the tests actually fail in multiple places due to multiple bugs. All tests pass on Cassandra. The tests reproduce six already-known Scylla issues and one new issue: Already known issues: Refs #2060: Allow mixing token and partition key restrictions Refs #5361: LIMIT doesn't work when using GROUP BY Refs #5362: LIMIT is not doing it right when using GROUP BY Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY Refs #12477: Combination of COUNT with GROUP BY is different from Cassandra in case of no matches Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column A new issue discovered by these tests: Refs #13109: Incorrect sort order when combining IN, GROUP BY and ORDER BY Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13126	2023-03-15 12:40:24 +02:00
Pavel Emelyanov	bfc0533a8d	test: Update boost.suite.run_first list In debug mode the timings are: view_schema_test: 90 sec cql_query_test: 170 sec memtable_test: 2090 sec cql_functions_test: 2591 sec other tests that are in/out of this list are not that obvious, but the former two apparently deserve being replaced with the latter two. Timings for dev/release modes are not that horrible, but the "first pair is notably smaller than the latter" relation also exists. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13142	2023-03-15 12:10:50 +02:00
Botond Dénes	878ee27d74	Merge 'Load SSTable at the shard that actually own it' from Raphael "Raph" Carvalho Today, the SSTable generation provides a hint on which shard owns a particular SSTable. That hint determines which shard will load the SSTable into memory. With upcoming UUID generation, we will no longer have this hint embedded into the SSTable generation, meaning that SSTables will be loaded at random shards. This is not good because shards will have to reference memory from other shards to access the SSTable metadata that was allocated elsewhere. This patch changes sstable_directory to: 1) Use generation value to only determine which shard will calculate the owner shards for SSTables. Essentially works like a round-robin distribution. 2) The shard assigned to compute the owners for a SSTable will do so reading the minimum from disk, usually only Scylla file is needed. 3) Once that shard finished computing the owners, it will forward the SSTable to the shard that own it. 4) Shards will later load SSTables locally that were forwarded to them. Closes #13114 * github.com:scylladb/scylladb: sstables: sstable_directory: Load SSTable at the shard that actually own it sstables: sstable_directory: Give sstable_info_vector a more descriptive name sstables: Allow owner shards to be computed for a partially loaded SSTable sstables: Move SSTable loading to sstable_directory::sort_sstable() sstables: Move sstable_directory::sort_sstable() to private interface sstables: Restore indentation in sstable_directory::sort_sstable() sstables: Coroutinize sstable_directory::sort_sstable() sstables: sstable_directory: Extract sstable loading from process_descriptor() sstables: sstable_directory: Separate private fields from methods sstables: Coroutinize sstable_directory::process_descriptor	2023-03-15 10:43:22 +02:00
Kefu Chai	4505b0a9ca	build: cmake: add more tests * test/boost: add more tests: all tests listed in test/boost/CMakeLists.txt should build now. * rust: add inc library, which is used for testing. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-15 15:38:47 +08:00
Kefu Chai	cac6ba529d	build: cmake: define SEASTAR_TESTING_MAIN for SEASTAR tests we need the `main()` defined by seastar/testing/seastar_test.hh for driving the tests. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-15 15:38:47 +08:00
Kefu Chai	d9e3ffebf2	build: cmake: do not include main.cc in scylla-main main.cc should only be included by scylla. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-15 15:38:46 +08:00
Kefu Chai	1cd3764b08	build: cmake: add missing sources in test-lib and utils Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-15 15:38:46 +08:00
Kefu Chai	269cce4c2c	build: cmake: remove Seastar from the option name change the option name to "LINK_MEM_PER_JOB" as this is not a Seastar option, but a top-level project option. Signed-off-by: Kefu Chai <tchaikov@gmail.com>	2023-03-15 15:38:46 +08:00
Michał Chojnowski	866672a9fa	storage_proxy: rename metrics after service level rename Under some circumstances, service_level_controller renames service levels for internal purposes. However, the per-service-level metrics registered by storage_proxy keep the name seen at first registration time. This sometimes leads to mislabeled metrics. Fix that by re-registering the metrics after scheduling groups are renamed. Fixes scylladb/scylla-enterprise#2755 Closes #13174	2023-03-15 09:15:54 +02:00
Botond Dénes	6373452b31	Merge 'Do not mask node operation errors' from Benny Halevy This series handles errors when aborting node operations and prints them rather letting them leak and be exposed to the user. Also, cleanup the node_ops logging formats when aborting different node ops and add more error logging around errors in the "worker" nodes. Closes #12799 * github.com:scylladb/scylladb: storage_service: node_ops_signal_abort: print a warning when signaling abort storage_service: s/node_ops_singal_abort/node_ops_signal_abort/ storage_service: node_ops_abort: add log messages storage_service: wire node_ops_ctl for node operations storage_service: add node_ops_ctl class to formalize all node_ops flow repair: node_ops_cmd_request: add print function repair: do_decommission_removenode_with_repair: log ignore_nodes repair: replace_with_repair: get ignore_nodes as unordered_set gossiper: get_generation_for_nodes: get nodes as unordered_set storage_service: don't let node_ops abort failures mask the real error	2023-03-15 09:11:31 +02:00
Petr Gusev	afe1d39bdb	schema.hh: choose schema_commitlog based on schema_static_props flag This patch finishes the refactoring. We introduce the use_schema_commitlog flag in schema_static_props and use it to choose the commitlog in database::add_column_family. The only configurator added declares what was originally in database::add_column_family - all tables from schema_tables keyspace should use schema_commitlog.	2023-03-14 19:43:51 +04:00
Petr Gusev	3ef201d67a	schema.hh: use schema_static_props for wait_for_sync_to_commitlog This patch continues the refactoring, now we move wait_for_sync_to_commitlog property from schema_builder to schema_static_props. The patch replaces schema_builder::set_wait_for_sync_to_commitlog and is_extra_durable with two register_static_configurator, one in system_keyspace and another in system_distributed_keyspace. They correspond to the two parts of the original disjunction in schema_tables::is_extra_durable.	2023-03-14 19:26:05 +04:00
Calle Wilund	4681c4b572	configurables: Add optional service lookup to init callback Simplified, more direct version of "dependency injection". I.e. caller/initiator (main/cql_test_env) provides a set of services it will eventually start. Configurable can remember these. And use, at least after "start" notification. Closes #13037	2023-03-14 17:13:52 +02:00
Petr Gusev	349bc1a9b6	schema.hh: introduce schema_static_props, use it for null_sharder Our goal (#12642) is to mark raft tables to use schema commitlog. There are two similar cases in code right now - with_null_sharder and set_wait_for_sync_to_commitlog schema_builder methods. The problem is that if we need to mark some new schema with one of these methods we need to do this twice - first in a method describing the schema (e.g. system_keyspace::raft()) and second in the function create_table_from_mutations, which is not obvious and easy to forget. create_table_from_mutations is called when schema object is reconstructed from mutations, with_null_sharder and set_wait_for_sync_to_commitlog must be called from it since the schema properties they describe are not included in the mutation representation of the schema. This patch proposes to distinguish between the schema properties that get into mutations and those that do not. The former are described with schema_builder, while for the latter we introduce schema_static_props struct and the schema_builder::register_static_configurator method. This way we can formulate a rule once in the code about which schemas should have a null sharder, and it will be enforced in all cases.	2023-03-14 18:29:34 +04:00
Wojciech Mitros	52eb70aef0	docs: make wasm documentation visible for users Until now, the instructions on generating wasm files and using them for Scylla UDFs were stored in docs/dev, so they were not visible on the docs website. Now that the Rust helper library for UDFs is ready, and we're inviting users to try it out, we should also make the rest of the Wasm UDF documentation readily available for the users. Closes #13139	2023-03-14 16:21:23 +02:00
David Garcia	63ad5607ee	docs: Update custom styles	2023-03-14 12:06:20 +00:00
David Garcia	bad914a34d	docs: Update styles	2023-03-14 12:01:33 +00:00
David Garcia	8c4659a379	docs: Add card logos	2023-03-14 10:37:23 +00:00
Botond Dénes	1d9b7f3a92	Merge 'cmake: sync with `configure.py` (11/n)' from Kefu Chai - build: cmake: remove test which does not exist yet - build: cmake: document add_scylla_test() - build: cmake: extract index, repair and data_dictionary out - build: cmake: extract scylla-main out - build: cmake: find Snappy before using it - build: cmake: add missing linkages - build: cmake: add missing sources to test-lib - build: cmake: link sstables against libdeflate - build: cmake: link Boost::regex against ICU::uc Closes #13110 * github.com:scylladb/scylladb: build: cmake: link Boost::regex against ICU::uc build: cmake: link sstables against libdeflate build: cmake: add missing sources to test-lib build: cmake: add missing linkages build: cmake: find Snappy before using it build: cmake: extract scylla-main out build: cmake: extract index, repair and data_dictionary out build: cmake: document add_scylla_test() build: cmake: remove test which does not exist yet	2023-03-14 11:45:48 +02:00
Petr Gusev	00fc73d966	database.cc: drop ensure_populated and mark_as_populated There was some logic to call mark_as_populate at the appropriate places, but the _populated field and the ensure_populated function were not used by anyone.	2023-03-14 13:32:25 +04:00
Botond Dénes	e22b27a107	Merge 'Improve database shutdown verbosity' from Pavel Emelyanov The `database::stop` method is sometimes hanging and it's always hard to spot where exactly it sleeps. Few more logging messages would make this much simpler. refs: #13100 refs: #10941 Closes #13141 * github.com:scylladb/scylladb: database: Increase verbosity of database::stop() method large_data_handler: Increase verbosity on shutdown large_data_handler: Coroutinize .stop() method	2023-03-14 10:55:31 +02:00
Kefu Chai	5842804591	install-dependencies: extract go_arch() out for defining the mapping from the output of `arch` to the corresponding GO_ARCH. see `b94dc384ca/src/go/build/syslist.go (L55)` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13151	2023-03-14 10:05:09 +03:00
Raphael S. Carvalho	0c77f77659	sstables: sstable_directory: Load SSTable at the shard that actually own it Today, the SSTable generation provides a hint on which shard owns a particular SSTable. That hint determines which shard will load the SSTable into memory. With upcoming UUID generation, we will no longer have this hint embedded into the SSTable generation, meaning that SSTables will be loaded at random shards. This is not good because shards will have to reference memory from other shards to access the SSTable metadata that was allocated elsewhere. This patch changes sstable_directory to: 1) Use generation value to only determine which shard will calculate the owner shards for SSTables. Essentially works like a round-robin distribution. 2) The shard assigned to compute the owners for a SSTable will do so reading the minimum from disk, usually only Scylla file is needed. 3) Once that shard finished computing the owners, it will forward the SSTable to the shard that own it. 4) Shards will later load SSTables locally that were forwarded to them. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-13 15:40:43 -03:00
Raphael S. Carvalho	2c4e141314	sstables: sstable_directory: Give sstable_info_vector a more descriptive name Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-13 15:40:43 -03:00
Raphael S. Carvalho	a83328c358	sstables: Allow owner shards to be computed for a partially loaded SSTable Today, owner shards can only be computed for a fully loaded SSTable. For upcoming changes in the SSTable loader, we want to load the minimum from disk to be able to compute the set of shards owning the SSTable. If sharding metadata is available, it means we only need to read TOC and Scylla components. Otherwise, Summary must be read to provide first and last keys for compute_shards_for_this_sstable() to operate on them instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-13 15:40:43 -03:00
Raphael S. Carvalho	b49ae56e70	sstables: Move SSTable loading to sstable_directory::sort_sstable() The reason for this change is that we'll want to fully load the SSTable only at the destination shard. Later, sort_sstable() will calculate set of owner shards for a SSTable by only loading scylla metadata file. If it turns out that the SSTable belongs to current shard, then we'll fully load the SSTable using the new and fresh sstable_directory::load_sstable(). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-13 15:40:43 -03:00
Raphael S. Carvalho	229d89dbde	sstables: Move sstable_directory::sort_sstable() to private interface Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-13 15:40:43 -03:00
Raphael S. Carvalho	36602d1025	sstables: Restore indentation in sstable_directory::sort_sstable() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-13 15:40:43 -03:00
Raphael S. Carvalho	825f23b7f9	sstables: Coroutinize sstable_directory::sort_sstable() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-13 15:40:43 -03:00
Raphael S. Carvalho	a19a9f5d99	sstables: sstable_directory: Extract sstable loading from process_descriptor() Will make it easier for process_descriptor to process the SSTable without having to fully load the SSTable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-13 15:40:43 -03:00
Raphael S. Carvalho	08e6df256e	sstables: sstable_directory: Separate private fields from methods Following the expected coding convention. It's also somewhat disturbing to see them mixed up. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-13 15:40:43 -03:00
Raphael S. Carvalho	7d751991c1	sstables: Coroutinize sstable_directory::process_descriptor Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-03-13 15:40:43 -03:00
Anna Stuchlik	8ceb8b0240	doc: add a Knowledge Base article about consitency, v2 of https://github.com/scylladb/scylladb/pull/12929 Closes #12957	2023-03-13 17:48:25 +02:00
Aleksandra Martyniuk	cb0e6d617a	test: extend test_compaction_task.py to test cleanup compaction	2023-03-13 16:36:20 +01:00
Aleksandra Martyniuk	27b999808f	compaction: create task manager's task for cleanup keyspace compaction on one shard Implementation of task_manager's task that covers cleanup keyspace compaction on one shard.	2023-03-13 16:35:39 +01:00
Aleksandra Martyniuk	7dd27205f6	compaction: create task manager's task for cleanup keyspace compaction Implementation of task_manager's task covering cleanup keyspace compaction that can be started through storage_service api.	2023-03-13 16:35:39 +01:00
Aleksandra Martyniuk	4a5752d0d0	api: add get_table_ids to get table ids from table infos	2023-03-13 16:35:39 +01:00
Aleksandra Martyniuk	8801f326c6	compaction: create cleanup_compaction_task_impl	2023-03-13 16:35:39 +01:00
Aleksandra Martyniuk	a976e2e05b	repair: fix indentation	2023-03-13 15:25:53 +01:00
Aleksandra Martyniuk	41abc87d28	repair: continue user requested repair if no_such_column_family is thrown When one of column families requested for repair does not exist, we should repair all other requested column families. no_such_column_family exception is caught and logged, and repair continues.	2023-03-13 15:25:52 +01:00
Aleksandra Martyniuk	2376a434b6	repair: add find_column_family_if_exists function	2023-03-13 15:25:15 +01:00
Botond Dénes	3f0b3489a2	reader_concurrency_semaphore: handle reader blocked on memory becoming inactive Kill said read's memory requests with std::bad_alloc and dequeue it from the memory wait list, then evict it on the spot. Now that `_inactive_reads` just store permits, we can do this easily.	2023-03-13 08:07:53 -04:00
Botond Dénes	4f5657422d	reader_concurrency_semaphore: move _permit_list next to the other lists A mostly cosmetic change. Also add a comment mentioning that this is the catch-all list.	2023-03-13 08:07:53 -04:00
Botond Dénes	d1bc5f9293	reader_permit: evict inactive read on timeout If the read is inactive when the timeout clock fires, evict it. Now that `_inactive_reads` just store permits, we can do this easily.	2023-03-13 08:07:53 -04:00
Botond Dénes	6181c08191	reader_concurrency_semaphore: move inactive_read to .cc It is not used in the header anymore and moving it to the .cc allows us to remove the dependency on flat_mutation_reader_v2.hh.	2023-03-13 08:07:53 -04:00
Botond Dénes	e56ec9373d	reader_concurrency_semaphore: store permits in _inactive_reads Add an member of type `inactive_read` to reader permit, and store permit instances in `_inactive_reads`. This list is now just another intrusive list the permit can be linked into, depending on its state. Inactive read handles now just store a reader permit pointer.	2023-03-13 08:07:53 -04:00
Botond Dénes	d11f9efbfe	reader_concurrency_semaphore: inactive_read: de-inline more methods They will soon need to access reader_permit::impl internals, only available in the .cc file.	2023-03-13 08:07:53 -04:00
Botond Dénes	8e296e8e05	reader_concurrency_semaphore: make _ready_list intrusive Following the same scheme we used to make the wait lists intrusive. Permits are added to the ready list intrusive list while waiting to be executed and moved back to the _permit_list when de-queued from this list. We now use a conditional variable for signaling when there are permits ready to be executed.	2023-03-13 08:07:53 -04:00
Nadav Har'El	c41b2d35ed	test/alternator: test concurrent TagResource / UntagResource This patch adds an Alternator test reproducing issue #6389 - that concurrent TagResource and/or UntagResource operations was broken and some of the concurrent modifications were lost. The test has two threads, one loops adds and removes a tag A, the other adds and removes a tag B. After we add tag A, we expect tag A to be there - but due to issue #6389 this modification was sometimes lost when it raced with an operation on B. This test consistently failed before issue #6389 was fixed, and passes now after the issue was fixed by the previous patches. The bug reproduces by chance, so it requires a fairly long loop (a few seconds) to be sure it reproduces - so is marked a "veryslow" test and will not run in CI, but can be used to manually reproduce this issue with: test/alternator/run --runveryslow test_tag.py::test_concurrent_tag Refs #6389. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-03-13 13:38:15 +02:00
Nadav Har'El	87f29d8fd2	db/tags: drop unsafe update_tags() utility function The previous patches introduced the function modify_tags() as a safe version of update_tags(), and switched all uses of update_tags() to use modify_tags(). So now that the unsafe update_tags() is no longer use, we can drop it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-03-13 13:35:17 +02:00
Kamil Braun	228856f577	Merge 'Test changing IP address of 2 nodes in a cluster out of 3 & misc cleanups' from Konstantin Osipov Closes #13135 * github.com:scylladb/scylladb: test: improve logging in ScyllaCluster raft: (test) test ip address change	2023-03-13 11:47:00 +01:00
Calle Wilund	dba45f3dc8	init: Add life cycle notifications to configurables Allows a configurable to subscribe to life cycle notifications for scylla app. I.e. do stuff on start/stop. Also allow configurables in cql_test_env v2: * Fix camel casing * Make callbacks future<> (should have been. mismerge?) Closes #13035	2023-03-13 12:45:20 +02:00
Nadav Har'El	c196bd78de	alternator: isolate concurrent modification to tags Alternator modifies tags in three operations - TagResource, UntagResource and UpdateTimeToLive (the latter uses a tag to store the TTL configuration). All three operations were implemented by three separate steps: 1. Read the current tags. 2. Modify the tags according to the desired operation. 3. Write the modified tags back with update_tags(). This implementation was not safe for concurrent operations - some modifications may be be lost. We fix this in this patch by using the new modify_tags() function introduced in the previous patch, which performs all three steps under one lock so the tag operations are serialized and correctly isolated. Fixes #6389 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-03-13 12:25:03 +02:00
Nadav Har'El	fbdf52acf6	db/tags: add safe modify_tags() utility functions The existing utility function update_tags() for modifying tags in a schema (used mainly by Alternator) is not safe for concurrent operations: The function first reads the old tags, then modifies them and writes them back. If two such calls happen concurrently, both calls may read the same old tags, make different modifications, and then both write the new tags, with one's write overwriting the other's. So in this patch, we introduce a new utility function, modify_tags(), to provide a concurrency-safe read-modify-write operation on tags. The new function takes a modification function and calls the read, modify and write steps together under a single lock. The new function also takes a table name instead of a schema object - because we need to read the schema under the lock, because might have already been changed by some other concurrent operation. This patch only introduces the new function, it doesn't change any code to use it yet, and doesn't remove the unsafe update_tags() function. We'll do those things in the next patches. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-03-13 11:51:01 +02:00
Nadav Har'El	e5e9b59518	migration_manager: expose access to storage_proxy A migration_manager holds a reference to a storage_proxy, and uses it internally a lot - e.g., to gain access to the data_dictionary. Users of migration_manager might also benefit from this storage_proxy - we will see such a case in the next patches. So let's provide a getter for the storage_proxy. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-03-13 11:43:53 +02:00
Israel Fruchter	ef229a5d23	Repackaging cqlsh cqlsh is moving into it's own repository: https://github.com/scylladb/scylla-cqlsh * add cqlsh as submodule * update scylla-java-tools to have cqlsh remove * introduced new cqlsh artifcat (rpm/deb/tar) Depends: https://github.com/scylladb/scylla-tools-java/pull/316 Ref: scylladb/scylladb#11569 Closes #11937 [avi: restore tools/java submodule location, adjust commit]	2023-03-12 20:22:33 +02:00
Pavel Emelyanov	0cd3a6993b	sstables: Don't rely on lexicographical prefix comparison When creating a deletion log for a bunch of sstables the code checks that all sstables share the same "storage" by lexicographically comparing their prefixes. That's not correct, as filesystem paths may refer to the same directory even if not being equal. So far that's been mostly OK, because paths manipulations were done in simple forms without producing unequal paths. Patch `8a061bd8` (sstables, code: Introduce and use change_state() call) triggerred a corner case. fs::path foo("/foo"); sstring sub(""); foo = foo / sub; produces a correct path of "/foo/", but the trailing slash breaks the aforementioned assumption about prefixes comparison. As a result, when an sstable moves between, say, staging and normal locations it may gain a trailing slash breaking the deletion log creation code. The fix is to restrict the deletion log creation not to rely on path strings comparison completely and trim the trailing slash if it happens. A test is included. fixes: #13085 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13090	2023-03-12 20:06:47 +02:00
Avi Kivity	beaa5a9117	Merge 'wasm: move compilation to an alien thread' from Wojciech Mitros The compilation of wasm UDFs is performed by a call to a foreign function, which cannot be divided with yielding points and, as a result, causes long reactor stalls for big UDFs. We avoid them by submitting the compilation task to a non-seastar std::thread, and retrieving the result using seastar::alien. The thread is created at the start of the program. It executes tasks from a queue in an infinite loop. All seastar shards reference the thread through a std::shared_ptr to a `alien_thread_runner`. Considering that the compilation takes a long time anyway, the alien_thread_runner is implemented with focus on simplicity more than on performance. The tasks are stored in an std::queue, reading and writing to it is synchronized using an std::mutex for reading/ writing to the queue, and an std::condition_variable waiting until the queue has elements. When the destructor of the alien runner is called, an std::nullopt sentinel is pushed to the queue, and after all remaining tasks are finished and the sentinel is read, the thread finishes. Fixes #12904 Closes #13051 * github.com:scylladb/scylladb: wasm: move compilation to an alien thread wasm: convert compilation to a future	2023-03-12 19:29:11 +02:00
Avi Kivity	24719ea639	Merge 'sstables: sstable_directory: avoid unnecessarily constructing tuple<> from pair<>' from Kefu Chai - sstables: sstable_directory: avoid unnecessarily constructing tuple<> from pair<> - sstables: sstable_directory: add type constraints Closes #13144 * github.com:scylladb/scylladb: sstables: sstable_directory: add type constraints sstables: sstable_directory: avoid unnecessarily constructing tuple<> from pair<>	2023-03-12 19:10:02 +02:00
Pavel Emelyanov	24e943f79b	install-dependencies: Add minio server and client These two are static binaries, so no need in yum/apt-installing them with dependencies. Just download with curl and put them into /urs/local/bin with X-bit set. This is needed for future object-storage work in order to run unit tests against minio. refs: #12523 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> [avi: regenerate frozen toolchain] Closes #13064 Closes #13099	2023-03-12 19:07:10 +02:00
Marcin Maliszkiewicz	74cc90a583	main: remove unused bpo::store	2023-03-12 16:59:27 +02:00
Nadav Har'El	e72b85e82c	Merge 'cql-pytest/lwt_test: test LWT UPDATE when partition/clustering ranges are empty' from Jan Ciołek Adds two test cases which test what happens when we perform an LWT UPDATE, but the partition/clustering key has 0 possible values. This can happen e.g when a column is supposed to be equal to two different values (`c = 0 AND c = 1`). Empty partition ranges work properly, empty clustering range currently causes a crash (#13129). I added tests for both of these cases. Closes #13130 * github.com:scylladb/scylladb: cql-pytest/test_lwt: test LWT update with empty clustering range cql-pytest/test_lwt: test LWT update with empty partition range	2023-03-12 15:11:33 +02:00
Nadav Har'El	53c8c43d8a	Merge 'cql3: improve support for C-style parenthesis casts' from Jan Ciołek CQL supports type casting using C-style casts. For example it's possible to do: `blob_column = (blob)funcReturningInt()` This functionality is pretty limited, we only allow such casts between types that have a compatible binary representation. Compatible means that the bytes will stay unchanged after the conversion. This means that it's legal to cast an int to blob (int is just a 4 byte blob), but it's illegal to cast a bigint to int (change 4 bytes -> 8 bytes). This simplifies things, to cast we can just reinterpret the value as the other type. Another use of C-style casts are type hints. Sometimes it's impossible to infer the exact type of an expression from the context. In such cases the type can be specified by casting the expression to this type. For example: `overloadedFunction((int)?)` Without the cast it would be impossible to guess what should be the bind marker's type. The function is overloaded, so there are many possible argument types. The type hint specifies that the bind marker has type int. An interesting thing is that such casts don't have to be explicit. CQL allows to put an int value in a place where a blob value is expected and it will be automatically converted without any explicit casting. --- I started looking at our implementation of casts because of #12900. In there the author expressed the need to specify a type hint for bind marker used to pass the WASM code. It could be either `(text)?` for text WASM, or `(blob)?` for binary WASM. This specific use of type hints wasn't supported because there was no `receiver` and the implementation of `prepare_expression` didn't handle that. Preparing casts without a receiver should be easy to implement - we can infer the type of the expression by looking at the type to which the expression is cast. But while reading `prepare_expression` for `expr::cast` I noticed that the code there is a bit strange. The implementation prepared the expression to cast using the original `receiver` instead of a receiver with the cast type. This caused some issues because of which casting didn't work as expected. For example it was possible to do: ```cql blob_column = (blob)funcReturningInt() ``` But this didn't work at all: ```cql blob_column = (blob)(int)12323 ``` It tried to prepare `untyped_contant(12323)` with a `blob` receiver, which fails. This makes `expr::cast` useless for casting. Casting when the representation is compatible is already implicit. I couldn't find a single case where adding a cast would change the behavior in any way. There was some use for it as a type hint to choose a specific overload of a function, but it was worthless for casting. Cassandra has the same issue, I created a `cql-pytest` test and it showed that we behave in the same way as Cassandra does. I decided to improve this. By preparing the expression using a receiver with the cast type, `expr::cast` becomes actually useful for casting values. Things like `(blob)(int)12323` now work without any issues. This diverges from the behavior in Cassandra, but it's an extension, not a breaking incompatibility. --- This PR improves `prepare_expression` for `expr::cast` in the following ways: 1) Support for more complex casts by preparing the expression using a different receiver. This makes casts like `(blob)(int)123` possible 2) Support preparing `expr::cast` without a receiver. Type inference chooses the cast type as the type of the expression. 3) Add pytest tests for C-style casts `2)` Is needed for #12900, the other changes is just something I decided to do since I was already working on this piece of code. Closes #13053 * github.com:scylladb/scylladb: expr_test: more tests for preparing bind variables with type hints prepare_expr: implement preparing expr::cast with no receiver prepare_expr: use :user formatting in cast_prepare_expression prepare_expr: remove std::get<> in cast_prepare_expression prepare_expr: improve cast_prepare_expression prepare_expr: improve readability in cast_prepare_expression cql-pytest: test expr::cast in test_cast.py	2023-03-12 15:07:54 +02:00
Nadav Har'El	843a5dfc15	Merge 'Allow setting permissions for user-defined functions' from Wojciech Mitros This series aims to allow users to set permissions on user-defined functions. The implementation is based on Cassandra's documentation and should be fully compatible: https://cassandra.apache.org/doc/latest/cassandra/cql/security.html#cql-permissions Fixes: #5572 Fixes: #10633 Closes #12869 * github.com:scylladb/scylladb: cql3: allow UDTs in permissions on UDFs cql3: add type_parser::parse() method taking user_types_metadata schema_change_test: stop using non-existent keyspace cql3: fix parameter names in function resource constructors cql3: handle complex types as when decoding function permissions cql3: enforce permissions for ALTER FUNCTION cql-pytest: add a (failing) test case for UDT in UDF cql-pytest: add a test case for user-defined aggregate permissions cql-pytest: add tests for function permissions cql3: enforce permissions on function calls selection: add a getter for used functions abstract_function_selector: expose underlying function cql3: enforce permissions on DROP FUNCTION cql3: enforce permissions for CREATE FUNCTION client_state: add functions for checking function permissions cql-pytest: add a case for serializing function permissions cql3: allow specifying function permissions in CQL auth: add functions_resource to resources	2023-03-12 14:04:34 +02:00
Avi Kivity	7f9c822346	Merge 'Coroutinize distributed_loader's reshape() function' from Pavel Emelyanov It was suggested as candidate from one of previous reviews, so here it is. Closes #13140 * github.com:scylladb/scylladb: distributed_loader: Indentation fix after previous patch distributed_loader: Coroutinize reshape() helper	2023-03-12 12:21:33 +02:00
Nadav Har'El	1379d8330f	Merge 'Teach sstables tests not to use tempdir explicitly' from Pavel Emelyanov Many sstable test cases create tempdir on their own to create sstables with. Sometimes it's justified when the test needs to check files on disk by hand for some validation, but often all checks are fs-agnostic. The latter case(s) can be patched to work on top of any storage, in particular -- on top of object storage. To make it work tests should stop creating sstables explicitly in tempdir and this PR does exactly that. All relevant occurrences of tempdir are removed from test cases, instead the sstable::test_env's tempdir is used. Next, the test_env::{create_sstable\|reusable_sst} are patched not to accept the `fs::path dir` argument and pick the env's tempdir. Finally, the `make_sstable_easy` helper is patched to use path-less env methods too. refs: #13015 Closes #13116 * github.com:scylladb/scylladb: test,sstables: Remove path from make_sstable_easy() test,lib: Remove wrapper over reusable_sst and move the comment test: Make "compact" test case use env dir test,compaction: Use env tempdir in some more cases test,compaction: Make check_compacted_sstables() use env's dir test: Relax making sstable with sequential generation test/sstable::test_env: Keep track of auto-incrementing generation test/lib: Add sstable maker helper without factory test: Remove last occurrence of test_env::do_with(rval, ...) test,sstables: Dont mess with tempdir where possible test/sstable::test_env: Add dir-less sstables making helpers test,sstables: Use sstables::test_env's tempdir with sweeper test,sstables: Use sstables::test_env's tempdir test/lib: Add tempdir sweeper test/lib: Open-code make_sstabl_easy into make_sstable test: Remove vector of mutation interposer from test_key_count_estimation	2023-03-12 10:14:26 +02:00
Kefu Chai	97e411bc96	sstables: sstable_directory: add type constraints add type constraits for `sstable_directory::parallel_for_each_restricted()`, to enforce the constraints on the function so it should be invocable with the argument of specified type. this helps to prevent the problems of passing function which accepts `pair<key, value>` or `tuple<key, value>`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-11 17:47:19 +08:00
Kefu Chai	0a29d62f4f	sstables: sstable_directory: avoid unnecessarily constructing tuple<> from pair<> `parallel_for_each_restricted()` maps the elements in the given container with the specified function. in this case, the elements is of type `unordered_map::value_type`, which is a `pair<const Key, Value>`. to convert it to a `tuple<Key, Value>`, the constructor of the tuple is called. but what we intend to do here is but to access the second element in the `pair<>` here. in this change, the function's signature is changed to match `scan_descriptors_map::value_type` to avoid the unnecessary overhead of constructor of `tuple<>`. also, because the underlying `max_concurrent_for_each()` does not pass a xvalue to the given func, instead, it just pass `*s.begin` to the function, where `s.begin` is an `Iterator` returned by `std::begin(container)`. so let's just use a plain reference as the parameter type for the function. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-11 17:47:19 +08:00
Konstantin Osipov	7309a1bd6b	test: improve logging in ScyllaCluster Print IP addresses and cluster identifiers in more log messages, it helps debugging.	2023-03-10 19:53:19 +03:00
Konstantin Osipov	4ace19928d	raft: (test) test ip address change	2023-03-10 19:52:40 +03:00
Pavel Emelyanov	f84f0a9414	database: Increase verbosity of database::stop() method Add logging messages when stopping (this way or another) various sub-services and helper objects Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-10 19:45:23 +03:00
Pavel Emelyanov	2f316880ae	large_data_handler: Increase verbosity on shutdown It may hang waiting for background handlers, so it's good to know if they exist at all Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-10 19:45:18 +03:00
Alejo Sanchez	e35762241a	api: gossiper: fix alive nodes Fix API call to wait for all shards to reach the current shard 0 gossiper version. Throws when timeout is reached. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-03-10 17:29:11 +01:00
Alejo Sanchez	6c04476561	gms, service: lock live endpoint copy To allow concurrent execution, protect copy of live endpoints with a semaphore.	2023-03-10 17:16:21 +01:00
Pavel Emelyanov	2000494881	large_data_handler: Coroutinize .stop() method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-10 19:06:14 +03:00
Pavel Emelyanov	e7250e5a3f	Merge 'sstables: add more constness' from Kefu Chai - sstables: mark param of sstable::_from_sstring() const - sstables: mark param of reverse_map() const - sstables: mark static lookup table const Closes #13115 github.com:scylladb/scylladb: sstables: mark static lookup table const sstables: mark param of reverse_map() const sstables: mark param of sstable::*_from_sstring() const	2023-03-10 17:14:56 +03:00
Kamil Braun	51a76e6359	Revert "Merge 'sstables: remove unused function add more constness' from Kefu Chai" This reverts commit `49e0d0402d`, reversing changes made to `25cf325674`. An old version of PR #13115 was accidentally merged into `master` (it was dequeued concurrently while a running next promotion job included it). Revert the merge. We'll merge the new version as a follow-up.	2023-03-10 15:02:28 +01:00
Aleksandra Martyniuk	4808220729	test: extend test_compaction_task.py test/rest_api/test_compaction_task.py is extended so that it checks validity of major compaction run from column family api.	2023-03-10 15:01:22 +01:00
Aleksandra Martyniuk	0918529fdf	api: unify major compaction Major compaction can be started from both storage_service and column_family api. The first allows to compact a subset of tables in given keyspace, while the latter - given table in given keyspace. As major compaction started from storage_service has a wider scope, we use its mechanisms for column_family's one. That makes it more consistent and reduces number of classes that would be needed to cover the major compaction with task manager's tasks.	2023-03-10 15:01:22 +01:00
Pavel Emelyanov	537510f7d2	scylla-gdb: Parse and eval _all_threads without quotes I've no idea why the quotes are there at all, it works even without them. However, with quotes gdb-13 fails to find the _all_threads static thread-local variable _unless_ it's printed with gdb "p" command beforehand. fixes: #13125 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13132	2023-03-10 15:01:22 +01:00
Pavel Emelyanov	b07570406e	distributed_loader: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-10 16:01:09 +03:00
Pavel Emelyanov	f90ea6efc2	distributed_loader: Coroutinize reshape() helper Drop do_with(), keep the needed variable on stack. Replace repeat() with plain loop + yield. Keep track of run_custom_job()'s exception. Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-10 15:37:57 +03:00
Wojciech Mitros	6b8c1823a3	cql3: allow UDTs in permissions on UDFs Currently, when preparing an authorization statement on a specific function, we're trying to "prepare" all cql types that appear in the function signature while parsing the statement. We cannot do that for UDTs, because we don't know the UDTs that are present in the databse at parsing time. As a result, such authorization statements fail. To work around this problem, we postpone the "preparation" of cql types until the actual statement validation and execution time. Until then, we store all type strings in the resource object. The "preparation" happens in the `maybe_correct_resource` method, which is called before every `execute` during a `check_access` call. At that point, we have access to the `query_processor`, and as a result, to `user_types_metadata` which allows us to prepare the argument types even for UDTs.	2023-03-10 11:02:33 +01:00
Wojciech Mitros	4f0b3539c5	cql3: add type_parser::parse() method taking user_types_metadata In a future patch, we don't have access to a `user_types_storage` while we want to parse a type, but we do have access to a `user_types_metadata`, which is enough to parse the type. We add a variant of the `type_parser::parse()` that takes a `user_types_metadata` instead of a `user_types_storage` to be able to parse a type also in the described context.	2023-03-10 11:02:33 +01:00
Wojciech Mitros	4182a221d6	schema_change_test: stop using non-existent keyspace The current implementation of CQL type parsing worked even when given a string representing a non-existent keyspace, as long as the parsed type was one of the "native" types. This implementation is going to change, so that we won't parse types given an incorrect keyspace name. When using `do_with_cql_env`, a "ks" keyspace is created by default, and "tests" keyspace is not. The tests for reverse schemas in `schema_change_test` were using the "tests" keyspace, so in order to make the tests work after the future changes, they now use the existing "ks" keyspace.	2023-03-10 11:02:32 +01:00
Wojciech Mitros	b93c7b94eb	cql3: fix parameter names in function resource constructors In some places, the parameter name used when constructing a resource object was 'function_name', while the actual argument was the signature of a function, which is particularly confusing, because function names also appear frequently in these contexts. This patch changes the identifiers to more accurately reflect, what they represent.	2023-03-10 11:02:32 +01:00
Wojciech Mitros	9a303fd99c	cql3: handle complex types as when decoding function permissions Currently, we're parsing types that appear in a function resource using abstract_type::parse_type, which only works with simple types. This patch changes it to db::marshal::type_parser::parse, which can also handle collections. We also adjust the test_grant_revoke_udf_permissions test so that it uses both simple and complex types as parameters of the function that we're granting/revoking permissions on.	2023-03-10 11:02:32 +01:00
Wojciech Mitros	438c7fdfa7	cql3: enforce permissions for ALTER FUNCTION Currently, the ALTER permission is only enforced on ALL FUNCTIONS or on ALL FUNCTIONS IN KEYSPACE. This patch enforces the permisson also on a specific function.	2023-03-10 11:02:32 +01:00
Piotr Sarna	c4e6925bb6	cql-pytest: add a (failing) test case for UDT in UDF Our permissions system is currently incapable of figuring out user-defined type definitions when preparing functions permissions. This test case creates such a function, and it passes on Cassandra.	2023-03-10 11:02:32 +01:00
Piotr Sarna	63e67c9749	cql-pytest: add a test case for user-defined aggregate permissions This test case is similar to the one for user-defined functions, but checks if aggregate permissions are enforced.	2023-03-10 11:02:32 +01:00
Piotr Sarna	6deebab786	cql-pytest: add tests for function permissions The test case checks that function permissions are enforced for non-superuser users.	2023-03-10 11:01:48 +01:00
Kefu Chai	77643717db	sstables: mark static lookup table const these tables are mappings from symbolic names to their string representation. we don't mutate them. so mark them const. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-10 16:18:29 +08:00
Kefu Chai	0889643243	sstables: mark param of reverse_map() const it does not mutate the map in which the value is looked up, so let's mark map const. also, take this opportunity to use structured binding for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-10 16:18:29 +08:00
Kefu Chai	9eae97c525	sstables: mark param of sstable::*_from_sstring() const neither of the changed function mutates the parameter. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-10 16:18:28 +08:00
Pavel Emelyanov	e3dc60286c	sstable: Remove unused friendship The components_writer class from this list doesn't even exist Also drop the forward declaration of mx::partition_reversing_data_source_impl Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13097	2023-03-10 07:13:18 +02:00
Jan Ciolek	c11f7a9e35	expr_test: more tests for preparing bind variables with type hints Add tests for preparing expr::cast which contains a bind variable, with a known receiver. expr::cast serves as a type hint for the bind variable. It specifies what should be the type of the bind variable, we must check that this type is compatible with the receiver and fail in case it isn't The following cases are tested: Valid: `text_col = (text)?` `int_col = (int)?` Invalid: `text_col = (int)?` `int_col = (text)?` Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-03-09 18:31:45 +01:00
Jan Ciolek	a08eb5cb76	prepare_expr: implement preparing expr::cast with no receiver Type inference in cast_prepare_expression was very limited. Without a receiver it just gave up and said that it can't infer the type. It's possible to infer the type - an expression that casts something to type bigint also has type bigint. This can be implemented by creating a fake receiver when the caller didn't specify one. Type of this fake receiver will be c.type and c.arg will be prepared using this receiver. Note that the previous change (changing receiver to cast_type_receiver in prepare_expression) is required to keep the behaviour consistent. Without it we would sometimes prepare c.arg using the original receiver, and sometimes using a receiver with type c.type. Currently it's impossible to test this change on live code. Every place that uses expr::cast specifies a receiver. A unit test is all that can be done at the moment to ensure correctness. In the future this functionality will be used in UDFs. In https://github.com/scylladb/scylladb/pull/12900 it was requested to be able to use a type hint to specify whether WASM code of the function will be sent in binary or text form. The user can convey this by typing either `(blob)?` or `(text)?`. In this case there will be no receiver and type inference would fail. After this change it will work - it's now possible to prepare either of those and get an expression with a known type. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-03-09 18:31:45 +01:00
Jan Ciolek	9f8340d211	prepare_expr: use :user formatting in cast_prepare_expression By default expressions are printed using the {:debug} formatting, wich is intended for internal use. Error messages should use the {:user} formatting instead. cast_prepare_expression uses the default formatting in a few places that are user facing, so let's change it to use {:user} formatting. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-03-09 18:31:45 +01:00
Jan Ciolek	12560b5745	prepare_expr: remove std::get<> in cast_prepare_expression A few times throughout cast_prepare_expression there's a line which uses std::get<> to get the raw type of the cast. `std::get<shared_ptr<cql3_type::raw>>(c.type)` This is a dangerous thing to do. It might turn out that the variant holds a different alternative and then it'll start throwing bad_variant_access. In this case this would happen if someone called cast_prepare_expression on an expression that is already prepared. It's possible to modify the code in a way that avoids doing the std::get altogether. It makes the code more resilient and gives me a piece of mind. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-03-09 18:31:45 +01:00
Jan Ciolek	7c384de476	prepare_expr: improve cast_prepare_expression Preparing expr::cast had some artificial limitations. Things like this worked: `blob_col = (blob)funcReturnsInt()` But this didn't: `blob_col = (blob)(int)1234` This is caused by the line: `prepare_expression(c.arg, db, keyspace, schema_opt, receiver)` Here the code prepares the expression to be cast using the original receiver which was passed to cast_prepare_expression. In the example above this meant that it tried to prepare untyped_constant(1234) using a receiver with type blob. This failed because an integer literal is invalid for a blob column. To me it looks like a mistake. What it should do instead is prepare the int literal using the type (int) and then see if int can be cast to blob, by checking if these types have compatible binary representation. This can be achieved by using `cast_type_receiver` instead of `receiver`. Making this small change makes it possible to use the cast in many situations where it was previously impossible. The tests have to be updated to reflect the change, some of them ow deviate from Cassandra, so they have to be marked scylla_only. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-03-09 18:31:41 +01:00
Piotr Sarna	62458b8e4f	cql3: enforce permissions on function calls Only users with EXECUTE permission are able to use the function in SELECT statements.	2023-03-09 17:51:17 +01:00
Piotr Sarna	4624934032	selection: add a getter for used functions The function allows extracting used function definitions from given selection. Thanks to that, it will be possible to verify if the callee has proper permissions to execute given functions.	2023-03-09 17:51:17 +01:00
Piotr Sarna	d95912c369	abstract_function_selector: expose underlying function It will be needed later in order to check this function's permissions.	2023-03-09 17:51:17 +01:00
Piotr Sarna	488934e528	cql3: enforce permissions on DROP FUNCTION Only users with DROP permission are allowed to drop user-defined functions.	2023-03-09 17:51:15 +01:00
Piotr Sarna	e8afcf7796	cql3: enforce permissions for CREATE FUNCTION Only users with CREATE permissions are allowed to create user-defined functions.	2023-03-09 17:50:56 +01:00
Piotr Sarna	d10799a834	client_state: add functions for checking function permissions The helper functions will be later used to enforce permissions for user-defined functions.	2023-03-09 17:50:56 +01:00
Piotr Sarna	8de1017691	cql-pytest: add a case for serializing function permissions This test case checks that granting function permissions result in correct serialization of the permissions - so that reading system_auth.role_permissions and listing the permissions via CQL with `LIST permission OF role` works in a compatible way with both Scylla and Cassandra.	2023-03-09 17:50:56 +01:00
Piotr Sarna	aa4c15a44a	cql3: allow specifying function permissions in CQL This commit allows users to specify the following resources: - ALL FUNCTIONS - ALL FUNCTIONS IN KEYSPACE ks - FUNCTION f(int, double) The permissions set for these resources are not enforced yet.	2023-03-09 17:50:56 +01:00
Piotr Sarna	5b662dd447	auth: add functions_resource to resources This commit adds "functions" resource to our authorization resources. The implementation strives to be compatible with Cassandra both from CQL level and serialization, i.e. so that entries in system_auth.role_permissions table will be identical if CassandraAuthorizer is used. This commit adds a way of representing these resources in-memory, but they are not enforced as permissions yet. The following permissions are supported: ``` CREATE ALL FUNCTIONS CREATE ALL FUNCTIONS IN KEYSPACE <ks> ALTER ALL FUNCTIONS ALTER ALL FUNCTIONS IN KEYSPACE <ks> ALTER FUNCTION <f> DROP ALL FUNCTIONS DROP ALL FUNCTIONS IN KEYSPACE <ks> DROP FUNCTION <f> AUTHORIZE ALL FUNCTIONS AUTHORIZE ALL FUNCTIONS IN KEYSPACE <ks> AUTHORIZE FUNCTION <f> EXECUTE ALL FUNCTIONS EXECUTE ALL FUNCTIONS IN KEYSPACE <ks> EXECUTE FUNCTION <f> ``` as per https://cassandra.apache.org/doc/latest/cassandra/cql/security.html#cql-permissions	2023-03-09 17:50:19 +01:00
Jan Ciolek	e4a3e2ac14	cql-pytest/test_lwt: test LWT update with empty clustering range Add a test case which performs an LWT UPDATE, but the clustering key has 0 possible values, because it's supposed to be equal to two different values. This currently causes a crash, see https://github.com/scylladb/scylladb/issues/13129 Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-03-09 15:44:10 +01:00
Jan Ciolek	5e5e4c5323	cql-pytest/test_lwt: test LWT update with empty partition range Add a test case which performs an LWT UPDATE, but the partition key has 0 possible values, because it's supposed to be equal to two different values. Such queries used to cause problems in the past. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-03-09 15:43:24 +01:00
Anna Stuchlik	6aff78ded2	doc: Remove Enterprise content from OSS docs Related: https://github.com/scylladb/scylladb/issues/13119 This commit removes the pages that describe Enterprise only features from the Open Source documentation: - Encryption at Rest - Workload Prioritization - LDAP Authorization - LDAP Authentication - Audit In addition, it removes most of the information about Incremental Compaction Strategy (ICS), which is replaced with links to the Enterprise documentation. The changes above required additional updates introduced with this commit: - The links to Enterprise-only features are replaced with the corresponding links in the Enterprise documentation. - The redirections are added for the removed pages to be redirected to the corresponding pages in the Enterprise documentation. This commit must be reverted in the scylla-enterprise repository to avoid deleting the Enterprise-only content from the Enterprise docs. Closes #13123	2023-03-09 15:40:43 +02:00
Botond Dénes	11dde4b80b	reader_permit: add wait_for_execution state Used while the permit is in the _ready_list, waiting for the execution loop to pick it up. This just acknowledging the existence of this wait-state. This state will now show up in permit diagnostics printouts and we can now determine whether a permit is waiting for execution, without checking which queue it is in.	2023-03-09 07:11:51 -05:00
Botond Dénes	6229f8b1a6	reader_concurrency_semaphore: make wait lists intrusive Instead of using expiring_fifo to store queued permits, use the same intrusive list mechanism we use to keep track of all permits. Permits are now moved between the _permit_list and the wait queues, depending on which state they are in. This means _permit_list is now not the definitive list containing all permits, instead it is the list containing all permits that are not in a more specialized queue at the moment. Code wishing to iterate over all permits should now use foreach_permits(). For outside code, this was already the only way and internal users are already patched. Making the wait lists intrusive allows us to dequeue a permit from any position, with nothing but a permit reference at hand. It also means the wait queues don't have any additional memory requirements, other than the memory for the permit itself. Timeout while being queued is now handled by the permit's on_timeout() callback.	2023-03-09 07:11:49 -05:00
Benny Halevy	0f07a24889	storage_service: node_ops_signal_abort: print a warning when signaling abort Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-09 14:10:10 +02:00
Benny Halevy	2a1015dced	storage_service: s/node_ops_singal_abort/node_ops_signal_abort/ Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-09 14:09:09 +02:00
Benny Halevy	6394e9acf7	storage_service: node_ops_abort: add log messages So we can correlate the respective messages on the node_ops coordinator side. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-09 14:04:56 +02:00
Benny Halevy	3652025062	storage_service: wire node_ops_ctl for node operations Use the node_ops_ctl methods for the basic flow of: start, start_heartbeat_updater, prepare, send_to_all, done\|abort As well for querying pending ops for decommission. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-09 14:02:31 +02:00
Botond Dénes	9ea9a48dbc	reader_concurrency_semaphore: move most wait_queue methods out-of-line They will soon depend on the definition of the reader_permit::impl, which is only available in the .cc file.	2023-03-09 06:53:11 -05:00
Botond Dénes	1d27dd8f0e	reader_concurrency_semaphore: store permits directly in queues Instead of the `entry` wrapper. In _wait_list and _ready_list, that is. Data stored in the `entry` wrapper is moved to a new `reader_permit::auxiliary_data` type. This makes the reader permit self-sufficient. This in turn prepares the ground for the ability to de-queue a permit from any queue, with nothing but a permit reference at hand: no need to have back pointer to wrappers and/or iterators.	2023-03-09 06:53:11 -05:00
Botond Dénes	bcfb8715f9	reader_permit: introduce (private) operator * and -> Currently the reader_permit has some private methods that only the semaphore's internal calls. But this method of communication is not consistent, other times the semaphore accesses the permit impl directly, calling methods on that. This commit introduces operator * and -> for reader_permit. With this, the semaphore internals always call the reader_permit::impl methods direcly, either via a direct reference, or via the above operators. This makes the permit internface a little narrower and reduces boilerplate code.	2023-03-09 06:53:11 -05:00
Botond Dénes	f5b80fdfd8	reader_concurrency_semaphore: remove redundant waiters() member There is now a field in stats with the same information, use that.	2023-03-09 06:53:11 -05:00
Botond Dénes	74a5981dbe	reader_concurrency_semaphore: add waiters counter Use it to keep track of all permits that are currently waiting on something: admission, memory or execution. Currently we keep track of size, by adding up the result of size() of the various queues. In future patches we are going to change the queues such that they will not have constant time size anymore, move to an explicit counter in preperation to that. Another change this commit makes is to also include ready list entries in this counter. Permits in the ready list are also waiters, they wait to be executed. Soon we will have a separate wait state for this too.	2023-03-09 06:53:11 -05:00
Botond Dénes	2694aa1078	reader_permit: use check_abort() for timeout Instead of having callers use get_timeout(), then compare it against the current time, set up a timeout timer in the permit, which assigned a new `_ex` member (a `std::exception_ptr`) to the appropriate exception type when it fires. Callers can now just poll check_abort() which will throw when `_ex` is not null. This is more natural and allows for more general reasons for aborting reads in the future. This prepares the ground for timeouts being managed inside the permit, instead of by the semaphore. Including timing out while in a wait queue.	2023-03-09 06:53:09 -05:00
Benny Halevy	d322bbf6ff	storage_service: add node_ops_ctl class to formalize all node_ops flow All node operations we currently support go through similar basic flow and may add some op-specific logic around it. 1. Select the nodes to sync with (this is op specific). 2. hearbeat updater 3. send prepare req 4. perform the body of the node operation 5. send done -- on any error: send abort node_ops_ctl formalizes all those steps and makes sure errors are handled in all steps, and the error causing abort is not masked by errors in the abort processing, and is propagated upstream. Some of the printouts repeat the node operation description to remain backward compatible so not to break dtests that wait for them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-09 13:48:34 +02:00
Wojciech Mitros	2fd6d495fa	wasm: move compilation to an alien thread The compilation of wasm UDFs is performed by a call to a foreign function, which cannot be divided with yielding points and, as a result, causes long reactor stalls for big UDFs. We avoid them by submitting the compilation task to a non-seastar std::thread, and retrieving the result using seastar::alien. The thread is created at the start of the program. It executes tasks from a queue in an infinite loop. All seastar shards reference the thread through a std::shared_ptr to a `alien_thread_runner`. Considering that the compilation takes a long time anyway, the alien_thread_runner is implemented with focus on simplicity more than on performance. The tasks are stored in an std::queue, reading and writing to it is synchronized using an std::mutex for reading/ writing to the queue, and an std::condition_variable waiting until the queue has elements. When the destructor of the alien runner is called, an std::nullopt sentinel is pushed to the queue, and after all remaining tasks are finished and the sentinel is read, the thread finishes.	2023-03-09 11:54:38 +01:00
Botond Dénes	23f4e250c2	reader_concurrency_semaphore: maybe_dump_permit_diagnostics(): remove permit list param This param is from a time when _permit_list was not accessible from the outside, so it was passed along the semaphore instance to avoid making the diagnostics methods friends. To allow the semaphore freedom in how permits are stored, the diagnostics code is instead made to use foreach_permit(), instead of accessing the underlying list directly. As the diagnostics code wants reader_permit::impl& directly, a new variant of foreach_permit() passing impl references is introduced.	2023-03-09 05:19:59 -05:00
Botond Dénes	59dc15682b	reader_concurrency_semaphroe: make foreach_permit() const It already is conceptually, as it passes const references to the permits it iterates over. The only reason it wasn't const before is a technical issue which is solved here with a const_cast.	2023-03-09 05:19:59 -05:00
Botond Dénes	c86136c853	reader_permit: add get_schema() and get_op_name() accessors	2023-03-09 05:19:59 -05:00
Botond Dénes	9dd2cd07ef	reader_concurrency_semaphore: mark maybe_dump_permit_diagnostics as noexcept It is in fact noexcept and so it is expected to be, so document this.	2023-03-09 05:19:59 -05:00
Benny Halevy	f3d6868738	repair: node_ops_cmd_request: add print function Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-09 11:42:03 +02:00
Benny Halevy	130d6faa06	repair: do_decommission_removenode_with_repair: log ignore_nodes Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-09 11:42:03 +02:00
Benny Halevy	ac13e1f432	repair: replace_with_repair: get ignore_nodes as unordered_set Prepare for following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-09 11:42:03 +02:00
Benny Halevy	78b0222842	gossiper: get_generation_for_nodes: get nodes as unordered_set Prepare for following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-09 11:42:03 +02:00
Benny Halevy	28eb11553b	storage_service: don't let node_ops abort failures mask the real error Currently failing to abort a node operation will throw and mask the original failure handled in the catch block. See #12333 for example. Fixes #12798 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-03-09 11:42:03 +02:00
Botond Dénes	49e0d0402d	Merge 'sstables: remove unused function add more constness' from Kefu Chai - sstables: remove unused function - sstables: mark param of sstable::_from_sstring() const - sstables: mark param of reverse_map() const - sstables: mark static lookup table const Closes #13115 github.com:scylladb/scylladb: sstables: mark static lookup table const sstables: mark param of reverse_map() const sstables: mark param of sstable::*_from_sstring() const sstables: remove unused function	2023-03-09 11:29:28 +02:00
Pavel Emelyanov	47df084363	test,sstables: Remove path from make_sstable_easy() The method in question is only called with env's tempdir, so there's no point in explicitly passing it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-09 08:21:48 +03:00
Pavel Emelyanov	8297ac0082	test,lib: Remove wrapper over reusable_sst and move the comment There's a wonderful comment describing what the reusable_sst is for near one of its wrappers. It's better to drop the wrapper and move the comment to where it belongs. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-09 08:21:48 +03:00
Pavel Emelyanov	27d45df35f	test: Make "compact" test case use env dir Same as most of the previous work -- remove the explicit capturing of env's tempdir over the test. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-09 08:21:48 +03:00
Pavel Emelyanov	fdff97a294	test,compaction: Use env tempdir in some more cases Both already do so, but get the tempdir explicitly. It's possible to make them much shorter by not carrying this variable over the code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-09 08:21:48 +03:00
Pavel Emelyanov	19ef07b059	test,compaction: Make check_compacted_sstables() use env's dir It's in fact using it already via argument. Next patch will do the same with another call, but having this change separately makes the next patch shorter and easier to review. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-09 08:21:48 +03:00
Pavel Emelyanov	ef8928f2cc	test: Relax making sstable with sequential generation Many test cases populate sstable with a factory that at the same time serves as a stable maintainer of a monitomic generation. Those can be greately relaxed by re-using the recently introduced generation from the test_env. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-09 08:21:48 +03:00
Pavel Emelyanov	be7f4ff53a	test/sstable::test_env: Keep track of auto-incrementing generation Lots of test cases make sstables with monotonically incrementing generation values. In Scylla code this counter is maintained in class table, but sstable tests not always have it. To mimic this behavior, the test_env can keep track of the generation, so that callers just don't mess with it (next patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-09 08:21:48 +03:00
Pavel Emelyanov	bc20879971	test/lib: Add sstable maker helper without factory There's a make_sstable_containing() helper that creates sstable and populates it with mutations (and makes some post validation). The helper accepts a factory function that should make sstable for it. This patch shuffles this helper a bit by introducing an overload that populates (and validates) the already existing sstable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-09 08:21:48 +03:00
Pavel Emelyanov	2bbc59dd58	test: Remove last occurrence of test_env::do_with(rval, ...) There's the lonely test case that uses the mentioned template to carry its own instance of tempdir over its lifetime. Patch the case to re-use the already existing env's tempdir and drop the template. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-09 08:21:48 +03:00
Pavel Emelyanov	4bd79dc900	test,sstables: Dont mess with tempdir where possible Beneficiary of the previuous patch -- those cases that make sstables in env's tempdir can now enjoy not mentioning this explicitly and letting the env specify the sstable making path itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-09 08:21:48 +03:00
Pavel Emelyanov	dfcfe0a355	test/sstable::test_env: Add dir-less sstables making helpers Lots of (most of) test cases out there generate sstables inside env's temporary directory. This patch adds some sugar to env that will allow test cases omit explicit env.tempdir() call. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-09 08:21:48 +03:00
Pavel Emelyanov	d28589a2f7	test,sstables: Use sstables::test_env's tempdir with sweeper Continuation of the previous patch. Some test cases are sensitive to having the temp directory clean, so patch them similarly, but equip with the sweeper on entry instead of their own temprid instance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-09 08:21:48 +03:00
Pavel Emelyanov	904853cd7b	test,sstables: Use sstables::test_env's tempdir The one is maintained by the env throughout its lifetime. For many test cases there's no point in generating tempdir on their own, so just switch to using env's one. The code gets longer lines, but this is going to change really soon. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-09 08:21:47 +03:00
Pavel Emelyanov	21e70e7edd	test/lib: Add tempdir sweeper This is a RAII-sh helper that cleans temp directory on destruction. To be used in cases when a test needs to do several checks over clean temporary directory (future patches). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-09 08:21:47 +03:00
Pavel Emelyanov	090e007e30	test/lib: Open-code make_sstabl_easy into make_sstable The former helper is going to get rid of the fs::path& dir argument, but the latter cannot yet live without it. The simplest solution is to open-code the helper until better times. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-09 08:21:47 +03:00
Pavel Emelyanov	8d727701a4	test: Remove vector of mutation interposer from test_key_count_estimation The test generates a vector of mutation to be later passed into make_sstable() helper which just applies them to memtable. The test case can generate memtable directly. This makes it possible to stop using the local tempdir in this test case by future patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-09 08:21:47 +03:00
Kefu Chai	87a6cb5925	sstables: mark static lookup table const these tables are mappings from symbolic names to their string representation. we don't mutate them. so mark them const. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-09 12:40:37 +08:00
Kefu Chai	c18709d4a1	sstables: mark param of reverse_map() const it does not mutate the map in which the value is looked up, so let's mark map const. also, take this opportunity to use structured binding for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-09 12:40:37 +08:00
Kefu Chai	4128ab2029	sstables: mark param of sstable::*_from_sstring() const neither of the changed function mutates the parameter. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-09 12:40:37 +08:00
Kefu Chai	c211b272f7	sstables: remove unused function `sstable::version_from_sstring()` is used nowhere, so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-09 12:40:37 +08:00
Avi Kivity	25cf325674	Merge 'api: s/request/http::request/' from Kefu Chai - api: reference httpd::* symbols like 'httpd::' - alternator: using chrono_literals before using it - api: s/request/http::request/ the last two commits were inspired Pavel's comment of > It looks like api/ code was caught by some using namespace seastar::httpd shortcut. they should be landed before we merge and include https://github.com/scylladb/seastar/pull/1536 in Scylla. Closes #13095 github.com:scylladb/scylladb: api: reference httpd::* symbols like 'httpd::*' alternator: using chrono_literals before using it api: s/request/http::request/	2023-03-08 18:08:21 +02:00
Avi Kivity	a96fcdaac6	Merge 'distributed_loader: print log without using fmt::format() and fix of typo' from Kefu Chai - distributed_loader: print log without using fmt::format() - distributed_loader: correct a typo in comment Closes #13108 * github.com:scylladb/scylladb: distributed_loader: correct a typo in comment distributed_loader: print log without using fmt::format()	2023-03-08 17:55:25 +02:00
Kefu Chai	3488b68413	build: cmake: link Boost::regex against ICU::uc Boost::regex references icu_67::Locale::Locale, so let's fix this. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-08 22:53:42 +08:00
Kefu Chai	51ff2907b8	build: cmake: link sstables against libdeflate sstables is the only place where libdefalte is used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-08 22:53:42 +08:00
Kefu Chai	2a18d470cc	build: cmake: add missing sources to test-lib Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-08 22:53:42 +08:00
Kefu Chai	0b3d25ab1b	build: cmake: add missing linkages these dependencies were found when trying to compile `user_function_test`. whenever a library libfoo references another one, say, libbar, the corresponding linkage from libfoo to libbar is added. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-08 22:53:42 +08:00
Kefu Chai	21a7c439bb	build: cmake: find Snappy before using it Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-08 22:53:42 +08:00
Kefu Chai	c8f762b6d0	build: cmake: extract scylla-main out so tests and other libraries can link against it. also, drop the unused abseil library linkages. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-08 22:53:42 +08:00
Kefu Chai	d07adcbe74	build: cmake: extract index, repair and data_dictionary out Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-08 22:53:42 +08:00
Kefu Chai	b1484a2a5f	build: cmake: document add_scylla_test() this change reuses part of Botond Dénes's work to add a full-blown CMakeLists.txt to build scylla. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-08 22:26:30 +08:00
Kefu Chai	b0433bf82b	build: cmake: remove test which does not exist yet it was an oversight in `11124ee972`, which added a test not yet included master HEAD yet. so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-08 22:26:30 +08:00
Nadav Har'El	a4a318f394	cql: USING TTL 0 means unlimited, not default TTL Our documentation states that writing an item with "USING TTL 0" means it should never expire. This should be true even if the table has a default TTL. But Scylla mistakenly handled "USING TTL 0" exactly like having no USING TTL at all (i.e., it took the default TTL, instead of unlimited). We had two xfailing tests demonstrating that Scylla's behavior in this is different from Cassandra. Scylla's behavior in this case was also undocumented. By the way, Cassandra used to have the same bug (CASSANDRA-11207) but it was fixed already in 2016 (Cassandra 3.6). So in this patch we fix Scylla's "USING TTL 0" behavior to match the documentation and Cassandra's behavior since 2016. One xfailing test starts to pass and the second test passes this bug and fails on a different one. This patch also adds a third test for "USING TTL ?" with UNSET_VALUE - it behaves, on both Scylla and Cassandra, like a missing "USING TTL". The origin of this bug was that after parsing the statement, we saved the USING TTL in an integer, and used 0 for the case of no USING TTL given. This meant that we couldn't tell if we have USING TTL 0 or no USING TTL at all. This patch uses an std::optional so we can tell the case of a missing USING TTL from the case of USING TTL 0. Fixes #6447 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13079	2023-03-08 16:18:23 +02:00
Kefu Chai	43b6f7d8d3	distributed_loader: correct a typo in comment s/to many/too many/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-08 18:17:43 +08:00
Kefu Chai	b6991f5056	distributed_loader: print log without using fmt::format() logger.info() is able to format the given arguments with the format string, so let's just let it do its job. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-08 18:17:43 +08:00
Alejo Sanchez	f55e91d797	gms, service: live endpoint copy method Move replication logic for live endpoint across shards to a separate method This will be used by API get alive nodes. As this is now in a method and outside gossiper::run(), assert it's called from shard 0. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-03-08 10:45:35 +01:00
Nadav Har'El	beb9a8a9fd	docs/alternator: recommend to disable auto_snapshot In issue #5283 we noted that the auto_snapshot option is not useful in Alternator (as we don't offer any API to restore the snapshot...), and suggested that we should automatically disable this option for Alternator tables. However, this issue has been open for more than three years, and we never changed this default. So until we solve that issue - if we ever do - let's add a paragraph in docs/alternator/alternator.md recommending to the user to disable this option in the configuration themselves. The text explains why, and also provides a link to the issue. Refs #5283 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13103	2023-03-08 10:50:59 +02:00
Jan Ciolek	0417c48bdc	cql-pytest: test unset value in UPDATE and LWT UPDATE Add a test which performs an UPDATE and tries to pass an UNSET_VALUE as a value for the primary key. There is also an LWT variant of this test that tries to set an UNSET_VALUE in the IF condition. These two tests are analogous to test_insert_update_where and test_insert_update_where_lwt, but use an UPDATE instead of INSERT. It's useful to test UPDATE as well as INSERT. When I was developing a fix for #13001 I initially added the condition for unset value inside insert_statement, but this didn't handle update statements. These two tests allowed me to see that UPDATE still causes a crash. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> Closes #13058	2023-03-08 10:39:26 +02:00
Raphael S. Carvalho	3fae46203d	replica: Fix undefined behavior in table::generate_and_propagate_view_updates() Undefined behavior because the evaluation order is undefined. With GCC, where evaluation is right-to-left, schema will be moved once it's forwarded to make_flat_mutation_reader_from_mutations_v2(). The consequence is that memory tracking of mutation_fragment_v2 (for tracking only permit used by view update), which uses the schema, can be incorrect. However, it's more likely that Scylla will crash when estimating memory usage for row, which access schema column information using schema::column_at(), which in turn asserts that the requested column does really exist. Fixes #13093. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #13092	2023-03-08 07:38:55 +02:00
Nadav Har'El	ef50e4022c	test: drop our "pytest" wrapper script When Fedora 37 came out, we discovered that its "pytest" script started to run Python with the "-s" option, which caused problems for packages installed personally via pip. We fixed this by adding our own wrapper script test/pytest. But this bug (https://bugzilla.redhat.com/show_bug.cgi?id=2152171) was already fixed in Fedora 37, and the new version already reached our dbuild. So we no longer need this wrapper script. Let's remove it. Fixes #12412 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13083	2023-03-08 07:31:37 +02:00
Jan Ciolek	63a7235017	prepare_expr: improve readability in cast_prepare_expression cast_prepare_expression takes care of preparing expr::cast, which is responsible for CQL C-style casts. At the first glance it can be hard to figure out what exactly does it do, so I added some comments to make things clearer. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-03-08 03:24:17 +01:00
Jan Ciolek	03d37bdc14	cql-pytest: test expr::cast in test_cast.py CQL supports C-style casts with the destination type specified inside parenthesis e.g `blob_column = (blob)funcThatReturnsInt()`. These casts can be used to convert values of types that have compatible binary representation, or as a type hint to specify the type where the situation is ambiguous. I didn't find any cql-pytest tests for this feature, so I added some. It looks like the feature works, but only partially. Doing things like this works: `blob_column = (blob)funcThatReturnsInt()` But trying to do something a bit more complex fails: `blob_column = (blob)(int)1234` This is the case in both Cassandra and Scylla, the tests introduced in this commit pass on both of them. In future commits I will extend this feature to support the more complex cases as well, then some tests will have to be marked scylla_only. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-03-08 03:24:13 +01:00
Nadav Har'El	cdedc79050	cql: add configurable restriction of minimum RF We have seen users unintentionally use RF=1 or RF=2 for a keyspace. We would like to have an option for a minimal RF that is allowed. Cassandra recently added, in Cassandra 4.1 (see apache/cassandra@5fdadb2 and https://issues.apache.org/jira/browse/CASSANDRA-14557), exactly such a option, called "minimum_keyspace_rf" - so we chose to use the same option name in Scylla too. This means that unlike the previous "safe mode" options, the name of this option doesn't start with "restrict_". The value of the minimum_keyspace_rf option is a number, and lower replication factors are rejected with an error like: cqlsh> CREATE KEYSPACE x WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor': 2 }; ConfigurationException: Replication factor replication_factor=2 is forbidden by the current configuration setting of minimum_keyspace_rf=3. Please increase replication factor, or lower minimum_keyspace_rf set in the configuration. This restriction applies to both CREATE KEYSPACE and ALTER KEYSPACE operations. It applies to both SimpleStrategy and NetworkTopologyStrategy, for all DCs or a specific DC. However, a replication factor of zero (0) is not forbidden - this is the way to explicitly request not to replicate (at all, or in a specific DC). For the time being, minimum_keyspace_rf=0 is still the default, which means that any replication factor is allowed, as before. We can easily change this default in a followup patch. Note that in the current implementation, trying to use RF below minimum_keyspace_rf is always an error - we don't have a syntax to make into just a warning. In any case the error message explains exactly which configuration option is responsible for this restriction. Fixes #8891. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #9830	2023-03-07 19:04:06 +02:00
Kamil Braun	2b44631ded	Merge 'storage_service: Make node operations safer by detecting asymmetric abort' from Tomasz Grabiec This patch fixes a problem which affects decommission and removenode which may lead to data consistency problems under conditions which lead one of the nodes to unliaterally decide to abort the node operation without the coordinator noticing. If this happens during streaming, the node operation coordinator would proceed to make a change in the gossiper, and only later dectect that one of the nodes aborted during sending of decommission_done or removenode_done command. That's too late, because the operation will be finalized by all the nodes once gossip propagates. It's unsafe to finalize the operation while another node aborted. The other node reverted to the old topolgy, with which they were running for some time, without considering the pending replica when handling requests. As a result, we may end up with consistency issues. Writes made by those coordinators may not be replicated to CL replicas in the new topology. Streaming may have missed to replicate those writes depending on timing. It's possible that some node aborts but streaming succeeds if the abort is not due to network problems, or if the network problems are transient and/or localized and affect only heartbeats. There is no way to revert after we commit the node operation to the gossiper, so it's ok to close node_ops sessions before making the change to the gossiper, and thus detect aborts and prevent later aborts after the change in the gossiper is made. This is already done during bootstrap (RBNO enabled) and replacenode. This patch canges removenode to also take this approach by moving sending of remove_done earlier. We cannot take this approach with decommission easily, because decommission_done command includes a wait for the node to leave the ring, which won't happen before the change to the gossiper is made. Separating this from decommission_done would require protocol changes. This patch adds a second-best solution, which is to check if sessions are still there right before making a change to the gossiper, leaving decommission_done where it was. The race can still happen, but the time window is now much smaller. The PR also lays down infrastructure which enables testing the scenarios. It makes node ops watchdog periods configurable, and adds error injections. Fixes #12989 Refs #12969 Closes #13028 * github.com:scylladb/scylladb: storage_service: node ops: Extract node_ops_insert() to reduce code duplication storage_service: Make node operations safer by detecting asymmetric abort storage_service: node ops: Add error injections service: node_ops: Make watchdog and heartbeat intervals configurable	2023-03-07 17:36:51 +01:00
Nadav Har'El	e69c9069d6	Merge 'build: enable more warnings' from Kefu Chai when comparing the disabled warnings specified by `configured.py` and the ones specified by `cmake/mode.common.cmake`, it turns out we are now able to enable more warning options. so let's enable them. the change was tested using Clang-17 and GCC-13. there are many errors from GCC-13, like: ``` /home/kefu/dev/scylladb/db/view/view.hh:114:17: error: declaration of ‘column_kind db::view::clustering_or_static_row::column_kind() const’ changes meaning of ‘column_kind’ [-fpermissive] 114 \| column_kind column_kind() const { \| ^~~~~~~~~~~ ``` so the build with GCC failed. and with this change, Clang-17 is able to build build the tree without warnings. Closes #13096 * github.com:scylladb/scylladb: build: enable more warnings test: do not initialize plain number with {} test: do not initialize a time_t with braces	2023-03-07 17:37:54 +02:00
Wojciech Mitros	4609a45ce3	wasm: convert compilation to a future After we move the compilation to a alien thread, the completion of the compilation will be signaled by fulfilling a seastar promise. As a result, the `precompile` function will return a future, and because of that, other functions that use the `precompile` functions will also become futures. We can do all the neccessary adjustments beforehand, so that the actual patch that moves the compilation will contain less irrelevant changes.	2023-03-07 14:27:38 +01:00
Avi Kivity	6aa91c13c5	Merge 'Optimize topology::compare_endpoints' from Benny Halevy The code for compare_endpoints originates at the dawn of time (`bc034aeaec`) and is called on the fast path from storage_proxy via `sort_by_proximity`. This series considerably reduces the function's footprint by: 1. carefully coding the many comparisons in the function so to reduce the number of conditional banches (apparently the compiler isn't doing a good enough job at optimizing it in this case) 2. avoid sstring copy in topology::get_{datacenter,rack} Closes #12761 * github.com:scylladb/scylladb: topology: optimize compare_endpoints to_string: add print operators for std::{weak,partial}_ordering utils: to_sstring: deinline std::strong_ordering print operator move to_string.hh to utils/ test: network_topology: add test_topology_compare_endpoints	2023-03-07 15:17:19 +02:00
Kamil Braun	fe14d14ce9	Merge 'Eliminate extraneous copies of dht::token_range_vector' from Benny Halevy In several places we copy token range vectors where we could move them and eliminate unnecessary memory copies. Ref #11005 Closes #12344 * github.com:scylladb/scylladb: dht/range_streamer: stream_async: move ranges_to_stream to do_streaming streaming: stream_session: maybe_yield streaming: stream_session: prepare: move token ranges to add_transfer_ranges streaming: stream_plan: transfer_ranges: move token ranges towards add_transfer_ranges dht/range_streamer: stream_async: do_streaming: move ranges downstream dht/range_streamer: add_ranges: clear_gently ranges_for_keyspace dht/range_streamer: get_range_fetch_map: reduce copies dht/range_streamer: add_ranges: move ranges down-stream dht/boot_strapper: move ranges to add_ranges dht/range_streamer: stream_async: incrementally update _nr_ranges_remaining dht/range_streamer: stream_async: erase from range_vec only after do_streaming success	2023-03-07 13:46:33 +01:00
Nadav Har'El	f05ea80fb5	test/cql-pytest: remove unused async marker One test in test/cql-pytest/test_batch.py accidentally had the asyncio marker, despite not using any async features. Remove it. The test still runs fine. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13002	2023-03-07 14:33:34 +02:00
Botond Dénes	3f0ace0114	Merge 'cmake: sync with `configure.py` (10/n)' from Kefu Chai - build: cmake: use different names for output of check_cxx_compiler_flag - build: cmake: only add supported warning flags to CMAKE_CXX_FLAGS - build: cmake: limit the number of link job Closes #13098 * github.com:scylladb/scylladb: build: cmake: limit the number of link job build: cmake: only add supported warning flags to CMAKE_CXX_FLAGS build: cmake: use different names for output of check_cxx_compiler_flag	2023-03-07 14:24:26 +02:00
Kefu Chai	063b3be8a7	api: reference httpd::* symbols like 'httpd::' it turns out we have `using namespace httpd;` in seastar's `request_parser.rl`, and we should not rely on this statement to expose the symbols in `seatar::httpd` to `seastar` namespace. in this change, api/.hh: all httpd symbols are referenced by `httpd::` instead of being referenced as if they are in `seastar`. * api/*.cc: add `using namespace seastar::httpd`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-07 18:21:03 +08:00
Kefu Chai	a37610f66a	alternator: using chrono_literals before using it we should assume that some included header does this for us. we'd have following compiling failure if seastar's src/http/request_parser.rl does not `using namespace httpd;` anymore. ``` /home/kefu/dev/scylladb/alternator/streams.cc:433:55: error: no matching literal operator for call to 'operator""h' with argument of type 'unsigned long long' or 'const char *', and no matching literal operator template static constexpr auto dynamodb_streams_max_window = 24h; ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-07 18:20:36 +08:00
Vlad Zolotarov	ae6724f155	transport: refactor CQL metrics This patch reorganizes and extends CQL related metrics. Before this patch we only had counters for specific CQL requests. However, many times we need to reason about the size of CQL queries: corresponding requests and response sizes. This patch adds corresponding metrics: - Arranges all 3 per-opcode statistics counters in a single struct. - Defines a vector of such structs for each CQL opcode. - Adjusts statistics updates accordingly - the code is much simpler now. - Removes old metrics that were accounting some CQL opcodes. - Adds new per-opcode metrics for requests number, request and response sizes: - New metrics are of a derived kind - rate() should be applied to them. - There are 3 new metrics names: - 'cql_requests_count' - 'cql_request_bytes' - 'cql_response_bytes' - New metrics have a per-opcode label - 'kind'. For example: A number of response bytes for an EXECUTE opcode on shard 0 looks as follows: scylla_transport_cql_response_bytes{kind="EXECUTE",shard="0"} Ref #13061 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20230302154816.299721-1-vladz@scylladb.com>	2023-03-07 12:02:34 +02:00
Kefu Chai	577b1c679c	build: enable more warnings when comparing the disabled warnings specified by `configured.py` and the ones specified by `cmake/mode.common.cmake`, it turns out we are now able to enable more warning options. so let's enable them. the change was tested using Clang-17 and GCC-13. there are many errors from GCC-13, like: ``` /home/kefu/dev/scylladb/db/view/view.hh:114:17: error: declaration of ‘column_kind db::view::clustering_or_static_row::column_kind() const’ changes meaning of ‘column_kind’ [-fpermissive] 114 \| column_kind column_kind() const { \| ^~~~~~~~~~~ ``` so the build with GCC failed. and with this change, Clang-17 is able to build build the tree without warnings. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-07 17:54:53 +08:00
Kefu Chai	f0659cb1bb	test: do not initialize plain number with {} this silences warnings like: ``` test/boost/secondary_index_test.cc:1578:5: error: braces around scalar initializer [-Werror,-Wbraced-scalar-init] { -7509452495886106294 }, ^~~~~~~~~~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-07 17:54:53 +08:00
Kefu Chai	7331edbc7a	test: do not initialize a time_t with braces time_t is defined as a "Arithmetic type capable of representing times". so we can just initialize it with 0 without braces. this change should silence warning like: ``` test/boost/aggregate_fcts_test.cc:238:45: error: braces around scalar initializer [-Werror,-Wbraced-scalar-init] auto tp = db_clock::from_time_t({ 0 }) + std::chrono::milliseconds(1); ^~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-07 17:54:53 +08:00
Pavel Emelyanov	a0718d2097	test: Don't populate / with sstables The sstable_compaction_test::simple_backlog_controller_test makes sstables with empty dir argument. Eventually this means that sstables happen in / directory [1], which's not nice. As a side effect this also makes sstable::storage::prefix() returns empty string which, in turn, confuses the code that tries to analyze the prefix contents (refs: #13090) [1] See, e.g. logs from https://jenkins.scylladb.com/job/releng/job/Scylla-CI/4757/consoleText ``` INFO 2023-03-06 21:23:04,536 [shard 0] compaction - [Compact ks.cf 51489760-bc54-11ed-a08c-7d3f1d77e2e4] Compacting [/la-1-big-Data.db:level=0:origin=] ``` Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13094	2023-03-07 11:44:33 +02:00
Kefu Chai	4da82b4117	data_dictionary: mark dtor of user_types_storage `virtual` we have another solution, to mark db_user_types_storage `final`. as we don't destruct `db_user_types_storage` with a pointer to any of its base classes. but it'd be much simpler to just mark the dtor virtual of the first base class which has virtual method(s). it's much idiomatic this way, and less error-prune. this change should silence following warning: ``` /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/stl_construct.h:88:2: error: destructor called on non-final 'replica::db_user_types_storage' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor] __location->~_Tp(); ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/stl_construct.h:149:12: note: in instantiation of function template specialization 'std::destroy_at<replica::db_user_types_storage>' requested here std::destroy_at(__pointer); ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/alloc_traits.h:674:9: note: in instantiation of function template specialization 'std::_Destroy<replica::db_user_types_storage>' requested here { std::_Destroy(__p); } ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/shared_ptr_base.h:613:28: note: in instantiation of function template specialization 'std::allocator_traits<std::allocator<void>>::destroy<replica::db_user_types_storage>' requested here allocator_traits<_Alloc>::destroy(_M_impl._M_alloc(), _M_ptr()); ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/shared_ptr_base.h:599:2: note: in instantiation of member function 'std::_Sp_counted_ptr_inplace<replica::db_user_types_storage, std::allocator<void>, __gnu_cxx::_S_atomic>::_M_dispose' requested here _Sp_counted_ptr_inplace(_Alloc __a, _Args&&... __args) ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/shared_ptr_base.h:972:6: note: in instantiation of function template specialization 'std::_Sp_counted_ptr_inplace<replica::db_user_types_storage, std::allocator<void>, __gnu_cxx::_S_atomic>::_Sp_counted_ptr_inplace<replica::database &>' requested here _Sp_cp_type(__a._M_a, std::forward<_Args>(__args)...); ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/shared_ptr_base.h:1712:14: note: in instantiation of function template specialization 'std::__shared_count<>::__shared_count<replica::db_user_types_storage, std::allocator<void>, replica::database &>' requested here : _M_ptr(), _M_refcount(_M_ptr, __tag, std::forward<_Args>(__args)...) ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/shared_ptr.h:464:4: note: in instantiation of function template specialization 'std::__shared_ptr<replica::db_user_types_storage>::__shared_ptr<std::allocator<void>, replica::database &>' requested here : __shared_ptr<_Tp>(__tag, std::forward<_Args>(__args)...) ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/shared_ptr.h:1009:14: note: in instantiation of function template specialization 'std::shared_ptr<replica::db_user_types_storage>::shared_ptr<std::allocator<void>, replica::database &>' requested here return shared_ptr<_Tp>(_Sp_alloc_shared_tag<_Alloc>{__a}, ^ /home/kefu/dev/scylladb/replica/database.cc:313:24: note: in instantiation of function template specialization 'std::make_shared<replica::db_user_types_storage, replica::database &>' requested here , _user_types(std::make_shared<db_user_types_storage>(*this)) ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13062	2023-03-07 10:36:03 +02:00
Wojciech Mitros	d4851ccae7	treewide: rename the "xwasm" UDF language to "wasm" When the WASM UDFs were first introduced, the LANGUAGE required in the CQL statements to use them was "xwasm", because the ABI for the UDFs was still not specified and changes to it could be backwards incompatible. Now, the ABI is stabilized, but if backwards incompatible changes are made in the future, we will add a new ABI version for them, so the name "xwasm" is no longer needed and we can finally change it to "wasm". Closes #13089	2023-03-07 10:21:11 +02:00
Botond Dénes	d1619eb38a	Merge 'Remove qctx from helpers that retrieve truncation record' from Pavel Emelyanov There are two places that do it -- commitlog and batchlog replayers. Both can have local system-keyspace reference and use system-keyspace local query-processor for it. The peering save_truncation_record() is not that simple and is not patched by this PR Closes #13087 * github.com:scylladb/scylladb: system_keyspace: Unstatic get_truncation_record() system_keyspace: Unstatic get_truncated_at() batchlog_manager: Add system_keyspace dependency main: Swap batchlog manager and system keyspace starts system_keyspace: Unstatic get_truncated_position() system_keyspace: Remove unused method commitlog: Create commitlog_replayer with system keyspace test: Make cql_test_env::get_system_keyspace() return sharded commiltlog: Line-up field definitions	2023-03-07 10:19:55 +02:00
Nadav Har'El	e7f9e57d64	docs/alternator: link to issue about too many stream shards docs/alternator/compatibility.md mentions a known problem that Alternator Streams are divided into too many "shards". This patch add a link to a github issue to track our work on this issue - like we did for most other differences mentioned in compatibility.md. Refs #13080 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13081	2023-03-07 10:04:13 +02:00
Kefu Chai	b25a6d5a9c	build: cmake: limit the number of link job this mirrors the settings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-07 15:34:12 +08:00
Kefu Chai	5e38845057	build: cmake: only add supported warning flags to CMAKE_CXX_FLAGS Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-07 15:24:02 +08:00
Kefu Chai	2b23de31ca	build: cmake: use different names for output of check_cxx_compiler_flag * use the value of disabled_warnings, not the variable name for warning options, otherwise we'd checking options like `-Wno-disabled_warnings`. * use different names for the output of check_cxx_compiler_flag() calls. as the output variable of check_cxx_compiler_flag(..) call is cached, we cannot reuse it for checking different warning options, Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-07 15:24:02 +08:00
Kefu Chai	5522080f80	api: s/request/http::request/ seastar::httpd::request was deprecated in favor of `seastar::http::request` since bdd5d929891d2cb821eca25896e25ed4ff658b7a. so let's use the latter. this change also silences the warning of: ``` /home/kefu/dev/scylladb/api/authorization_cache.cc: In function ‘void api::set_authorization_cache(http_context&, seastar::httpd::routes&, seastar::sharded<auth::service>&)’: /home/kefu/dev/scylladb/api/authorization_cache.cc:19:104: error: ‘using seastar::httpd::request = struct seastar::http::request’ is deprecated: Use http::request instead [-Werror=deprecated-declarations] 19 \| httpd::authorization_cache_json::authorization_cache_reset.set(r, [&auth_service] (std::unique_ptr<request> req) -> future<json::json_return_type> { \| ^~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-07 14:03:42 +08:00
Botond Dénes	2f4a793457	reader_concurrency_semaphore:: clear_inactive_reads(): defer evicting to evict() Instead of open-coding the same, in an incomplete way. clear_inactive_reads() does incomplete eviction in severeal ways: * it doesn't decrement _stats.inactive_reads * it doesn't set the permit to evicted state * it doesn't cancel the ttl timer (if any) * it doesn't call the eviction notifier on the permit (if there is one) The list goes on. We already have an evict() method that all this correctly, use that instead of the current badly open-coded alternative. This patch also enhances the existing test for clear_inactive_reads() and adds a new one specifically for `stop()` being called while having inactive reads. Fixes: #13048 Closes #13049	2023-03-07 08:45:04 +03:00
Kefu Chai	cee597560a	build: enable `-Wdefaulted-function-deleted` warning in general, the more static analysis the merrier. with the updated Seastar, which includes the commit of "core/sstring: define <=> operator for sstring", all defaulted '<=> operator' which previously rely on sstring's operator<=> will not be deleted anymore, so we can enable `-Wdefaulted-function-deleted` now. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12861	2023-03-06 18:41:44 +02:00
Kefu Chai	020483aa59	Update seastar submodule and main this change also includes change to main, to make this commit compile. see below: * seastar 9b6e181e42...9cbc1fe889 (46): > Merge 'Make io-tester jobs share sched classes' from Pavel Emelyanov > io_tester.md: Update the `rps` configuration option description > io_tester: Add option to limit total number of requests sent > Merge 'Keep outgoing queue all cancellable while negotiating (again)' from Pavel Emelyanov > io_tester: Add option to share classes between jobs > rpc: Abort connection if send_entry() fails > Merge 'build: build dpdk with `-fPIC` if BUILD_SHARED_LIBS' from Kefu Chai > build: cooking.sh: use the same BUILD_SHARED_LIBS when building ingredients > build: cooking.sh: use the same generator when building ingredients > core/memory: handle `strerror_r` returning static string > Merge 'build, rpc: lz4 related cleanups' from Kefu Chai > build, rpc: do not support lz4 < 1.7.3 > build: set the correct version when finding lz4 > build: include CheckSymbolExists > rpc: do not include lz4.h in header > build: set CMP0135 for Cooking.cmake > docs: drop building-.md > Merge 'seastar-addr2line: cleanups' from Kefu Chai > seastar-addr2line: refactor tests using unittest > seastar-addr2line: extract do_test() and main() > seastar-addr2line: do not import unused modules > scheduling: add a `rename` callback to scheduling_group_key_config > reactor: syscall thread: wakeup up reactor with finer granularity > build: build dpdk with `-fPIC` if BUILD_SHARED_LIBS > build: extract dpdk_extra_cflags out > core/sstring: remove a temporary variable > Merge 'treewide: include what we use, and add a checkheaders target' from Kefu Chai > perftune.py: auto-select the same number of IRQ cores on each NUMA > prometheus: remove unused headers > core/sstring: define <=> operator for sstring > Merge 'core: s/reserve_additional_memory/reserve_additional_memory_per_shard/' from Kefu Chai > include: do not include <concepts> directly > coding_style: note on self-contained header requirement > circileci: build checkheaders in addition to default target > build: add checkheaders target > net/toeplitz: s/u_int/unsigned/ > net/tcp-stack: add forward declaration for seastar::socket > core, net, util: include used headers main: set reserved memory for wasm on per-shard basis this change is a follow-up of `f05d612da8` and `4a0134a097`. this change depends on the related change in Seastar to reserve additional memory on a per-shard basis. per Wojciech Mitros's comment: > it should have probably been 50MB per shard in other words, as we always execute the same set of udf on all shards. and since one cannot predict the number of shards, but she could have a rough estimation on the size of memory a regular (set of) udf could use. so a per-shard setting makes more sense. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-06 18:41:34 +02:00
Jan Ciolek	aa604bd935	cql3: preserve binary_operator.order in search_and_replace There was a bug in `expr::search_and_replace`. It doesn't preserve the `order` field of binary_operator. `order` field is used to mark relations created using the SCYLLA_CLUSTERING_BOUND. It is a CQL feature used for internal queries inside Scylla. It means that we should handle the restriction as a raw clustering bound, not as an expression in the CQL language. Losing the SCYLLA_CLUSTERING_BOUND marker could cause issues, the database could end up selecting the wrong clustering ranges. Fixes: #13055 Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> Closes #13056	2023-03-06 16:28:06 +02:00
Kefu Chai	6b249dd301	utils: UUID: throw marshal_exception when fail to parse uuid * throw marshal_exception if not the whole string is parsed, we should error out if the parsed string contains gabage at the end. before this change, we silent accept uuid like "ce84997b-6ea2-4468-9f02-8a65abf4wxyz", and parses it as "ce84997b-6ea2-4468-9f02-8a65abf4". this is not correct. * throw marshal_exception if stoull() throws, `stoull()` throws if it fails to parse a string to an unsigned long long, we should translate the exception to `marshal_exception`, so we can handle these exception in a consistent manner. test is updated accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13069	2023-03-06 12:59:41 +02:00
Pavel Emelyanov	1be9b0df50	system_keyspace: Unstatic get_truncation_record() Now when both callers of this method are non-static, it can be made non-static too. While at it make two more changes: 1. move the thing to private 2. remove explicit cql3::query_processor::cache_internal::yes argument, the system_keyspace::execute_cql() applies it on itw own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-06 13:28:40 +03:00
Pavel Emelyanov	109e032f61	system_keyspace: Unstatic get_truncated_at() It's called from batchlog replayer which now has local system keyspace reference and can use it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-06 13:28:40 +03:00
Pavel Emelyanov	1907518034	batchlog_manager: Add system_keyspace dependency The manager will need system ks to get truncation record from, so add it explicitly. Start-stop sequence no allows that Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-06 13:28:40 +03:00
Pavel Emelyanov	40b762b841	main: Swap batchlog manager and system keyspace starts The former needs the latter to get truncation records from and will thus need it as explicit dependency. In order to have it bathlog needs to start after system ks. This works as starting batchlog manager doesn't do anything that's required by system keyspace. This is indirectly proven by cql-test-env in which batchlog manager starts later than it does in main Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-06 13:28:40 +03:00
Pavel Emelyanov	dcbe3e467b	system_keyspace: Unstatic get_truncated_position() It's called from commitlog replayer which has system keyspace instance on board and can use it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-06 13:28:40 +03:00
Pavel Emelyanov	2501ba3887	system_keyspace: Remove unused method The get_truncated_position() overload that filters records by shard is nowadays unused. Drop one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-06 13:28:40 +03:00
Pavel Emelyanov	47b61389b5	commitlog: Create commitlog_replayer with system keyspace The replayer code needs system keyspace to fetch truncation records from, thus it needs this explicit dependency. By the time it runs system keyspace is fully initialized already Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-06 13:28:36 +03:00
Kefu Chai	ac575d0b0e	auth: use zero initialization instead of passing '0' in the initializer list to do aggregate initialization, just use zero initialization. simpler this way. also, this helps to silence a `-Wmissing-braces` warning, like ``` /home/kefu/dev/scylladb/auth/passwords.cc:21:43: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces] static thread_local crypt_data tlcrypt = {0, }; ^ {} ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13060	2023-03-06 12:28:10 +02:00
Kefu Chai	36da27f2e0	sstables: generation_type: do not specialize to_sstring because `seastar::to_sstring()` defaults to `fmt::format_to()`. so any type which is supported by `fmt::formatter()` is also supported by `seastar::to_sstring()`. and the behavior of existing implementation is exactly the same as the defaulted one. so let's drop the specialization and let `fmt::formatter<sstables::generation_type>` do its job. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13070	2023-03-06 12:18:00 +02:00
Pavel Emelyanov	6f9924ff44	test: Make cql_test_env::get_system_keyspace() return sharded It now returns sys_ks.local(), but next patch would need the whole sharded reference Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-06 13:17:21 +03:00
Pavel Emelyanov	73ab1bd74b	commiltlog: Line-up field definitions Just a cosmetic change, so that next patch adding a new member to the class looks nice Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-06 13:15:27 +03:00
Alejo Sanchez	eaed778f4a	test/cql-pytest: print driver version Print driver version for cql-pytest tests. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #12840	2023-03-06 11:31:26 +02:00
Botond Dénes	4919b2f956	Merge 'cmake: sync with `configure.py` (9/n)' from Kefu Chai - build: cmake: find ANTLR3 before using it - build: cmake: define FMT_DEPRECATED_OSTREAM - build: cmake: add include directory for lua - build: cmake: link redis against db Closes #13071 * github.com:scylladb/scylladb: build: cmake: add more tests build: cmake: find and link against RapidJSON build: cmake: link couple libraries as whole archive build: cmake: find ANTLR3 before using it build: cmake: define FMT_DEPRECATED_OSTREAM build: cmake: add include directory for lua build: cmake: link redis against db	2023-03-06 08:52:13 +02:00
Avi Kivity	97f315cc29	Merge 'build: reenable disabled warnings' from Kefu Chai in general, the more static analysis the merrier. these warnings were previously added to silence warnings from Clang and/or GCC, but since we've addressed all of them, let's reenable them to detect potential issues early. Closes #13063 * github.com:scylladb/scylladb: build: reenable disabled warnings test: lib: do not return a local reference dht: incremental_owned_ranges_checker: use lower_bound() types: reimplement in terms of a variable template query_id: extract into new header test/cql-pytest: test for CLUSTERING ORDER BY verification in MV test/cql-pytest: allow "run-cassandra" without building Scylla build: reenable unused-{variable,lambda-capture} warnings test: reader_concurrency_semaphore_test: define target_memory in debug mode flat_mutation_reader_test: cleanup, seastar::async -> SEASTAR_THREAD_TEST_CASE make_nonforwardable: test through run_mutation_source_tests make_nonforwardable: next_partition and fast_forward_to when single_partition is true make_forwardable: fix next_partition flat_mutation_reader_v2: drop forward_buffer_to nonforwardable reader: fix indentation nonforwardable reader: refactor, extract reset_partition nonforwardable reader: add more tests nonforwardable reader: no partition_end after fast_forward_to() nonforwardable reader: no partition_end after next_partition() nonforwardable reader: no partition_end for empty reader api::failure_detector: mark set_phi_convict_threshold unimplemented test: memtable_test: mark dummy variable for loop [[maybe_unused]] idl-compiler: mark captured this used raft: reference this explicitly util/result_try: reference this explicitly sstables/sstables: mark dummy variable for loop [[maybe_unused]] treewide: do not define/capture unused variables service: storage_service: clear _node_ops in batch cql-pytest: add tests for sum() aggregate build: cmake: extract mutation,db,replica,streaming out build: cmake: link the whole auth build: cmake: extract thrift out build: cmake: expose scylla_gen_build_dir from "interface" build: cmake: find libxcrypt before using it build: cmake: find Thrift before using it build: cmake: support thrift < 0.11.0 test/cql-pytest: move aggregation tests to one file Revert "Revert "storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops"" storage_service: Wait for normal state handler to finish in replace storage_service: Wait for normal state handler to finish in bootstrap row_cache: pass partition_start though nonforwardable reader doc: fix the version in the comment on removing the note doc: specify the versions where Alternator TTL is no longer experimental	2023-03-05 17:37:33 +02:00
Kefu Chai	6742493a94	build: reenable disabled warnings in general, the more static analysis the merrier. these warnings were previously disabled to silence warnings from Clang and/or GCC, but since we've addressed all of them, let's reenable them to detect potential issues early. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-05 17:37:33 +02:00
Kefu Chai	fe80b5e0d0	test: lib: do not return a local reference the type of return value of `get_table_views()` is a reference, so we cannot return a reference to a temporary value. in this change, a member variable is added to hold the _table_schema, so it can outlive the function call. this should silence following warning from Clang: ``` test/lib/expr_test_utils.cc:543:16: error: returning reference to local temporary object [-Werror,-Wreturn-stack-address] return {view_ptr(_table_schema)}; ^~~~~~~~~~~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-05 17:37:33 +02:00
Kefu Chai	11124ee972	build: cmake: add more tests Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-04 13:11:25 +08:00
Kefu Chai	eeb8553305	build: cmake: find and link against RapidJSON despite that RapidJSON is a header-only library, we still need to find it and "link" against it for adding the include directory. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-04 13:11:25 +08:00
Kefu Chai	c5d1a69859	build: cmake: link couple libraries as whole archive turns out we are using static variables to register entries in global registries, and these variables are not directly referenced, so linker just drops them when linking the executables or shared libraries. to address this problem, we just link the whole archive. another option would be create a linker script or pass --undefined=<symbol> to linker. neither of them is straightforward. a helper function is introduced to do this, as we cannot use CMake 3.24 as yet. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-04 13:11:25 +08:00
Kefu Chai	58f13dfa0a	build: cmake: find ANTLR3 before using it if ANTLR3's header files are not installed into the /usr/include, or other directories searched by compiler by default. there are chances, we cannot build the tree. so we have to find it first. as /opt/scylladb is the directory where `scylla-antlr35-c++-dev` is installed on debian derivatives, this directory is added so the find package module can find the header files. ``` In file included from /home/kefu/dev/scylla/db/legacy_schema_migrator.cc:38: In file included from /home/kefu/dev/scylla/cql3/util.hh:21: /home/kefu/dev/scylla/build/cmake/cql3/CqlParser.hpp:55:10: fatal error: 'antlr3.hpp' file not found ^~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-04 13:11:25 +08:00
Kefu Chai	914ba1329d	build: cmake: define FMT_DEPRECATED_OSTREAM otherwise the tree would file to compile with fmt v9. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-04 13:11:25 +08:00
Kefu Chai	b6a927ce3f	build: cmake: add include directory for lua otherwise there are chances the compiler cannot find the lua header(s). Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-04 13:11:25 +08:00
Kefu Chai	e72321f873	build: cmake: link redis against db otherwise, we'd have ``` In file included from /home/kefu/dev/scylla/redis/keyspace_utils.cc:19: In file included from /home/kefu/dev/scylla/db/query_context.hh:14: In file included from /home/kefu/dev/scylla/cql3/query_processor.hh:24: In file included from /home/kefu/dev/scylla/lang/wasm_instance_cache.hh:19: /home/kefu/dev/scylla/lang/wasm.hh:14:10: fatal error: 'rust/wasmtime_bindings.hh' file not found ^~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-04 13:11:25 +08:00
Anna Stuchlik	4b71f87594	doc: Update the documentation landing page This commit makes the following changes to the docs landing page: - Adds the ScyllaDB enterprise docs as one of three tiles. - Modifies the three tiles to reflect the three flavors of ScyllaDB. - Moves the "New to ScyllaDB? Start here!" under the page title. - Renames "Our Products" to "Other Products" to list the products other than ScyllaDB itself. In addtition, the boxes are enlarged from to large-4 to look better. The major purpose of this commit is to expose the ScyllaDB documentation. docs: fix the link Closes #13065	2023-03-03 15:48:30 +02:00
Botond Dénes	fb898d214c	Merge 'Shard major compaction task' from Aleksandra Martyniuk Implementation of task_manager's task that covers major keyspace compaction on one shard. Closes #12662 * github.com:scylladb/scylladb: test: extend major keyspace compaction tasks test compaction: create task manager's task for major keyspace compaction on one shard	2023-03-02 15:06:31 +02:00
Botond Dénes	91d64372db	Merge 'cmake: sync with `configure.py` (8/n)' from Kefu Chai - build: cmake: extract more subsystem out into its own CMakeLists.txt - build: cmake: remove swagger_gen_files - build: cmake: remove stale TODO comments - build: cmake: expose scylla_gen_build_dir - build: cmake: link against cryptopp - build: cmake: add missing source to utils - build: cmake: move lib sources into test-lib - build: cmake: add test/perf Closes #13059 * github.com:scylladb/scylladb: build: cmake: add expr_test test build: cmake: allow test to specify the sources build: cmake: add test/perf build: cmake: move lib sources into test-lib build: cmake: add missing source to utils build: cmake: link against cryptopp build: cmake: expose scylla_gen_build_dir build: cmake: remove stale TODO comments build: cmake: remove swagger_gen_files build: cmake: extract more subsystem out into its own CMakeLists.txt	2023-03-02 14:22:35 +02:00
Botond Dénes	e70be47276	Merge 'commitlog: Fix updating of total_size_on_disk on segment alloc when o_dsync is off' from Calle Wilund Fixes #12810 We did not update total_size_on_disk in commitlog totals when use o_dsync was off. This means we essentially ran with no registered footprint, also causing broken comparisons in delete_segments. Closes #12950 * github.com:scylladb/scylladb: commitlog: Fix updating of total_size_on_disk on segment alloc when o_dsync is off commitlog: change type of stored size	2023-03-02 12:39:11 +02:00
Botond Dénes	1b5f8916d6	Merge 'Generalize sstable::move_to_new_dir() method' from Pavel Emelyanov This method requires callers to remember that the sstable is the collection of files on a filesystem and to know what exact directory they are all in. That's not going to work for object storage, instead, sstable should be moved between more abstract states. This PR replaces move_to_new_dir() call with the change_state() one that accepts target sub-directory string and moves files around. Currently supported state changes: * staging -> normal * upload -> normal \| staging * any -> quarantine All are pretty straightforward and move files between table basedir subdirectories with the exception that upload -> quarantine should move into upload/quarantine subdirectory. Another thing to keep in mind, that normal state doesn't have its subdir but maps directory to table's base directory. Closes #12648 * github.com:scylladb/scylladb: sstable: Remove explicit quarantization call test: Move move_to_new_dir() method from sstable class sstable, dist.-loader: Introduce and use pick_up_from_upload() method sstables, code: Introduce and use change_state() call distributed_loader: Let make_sstables_available choose target directory	2023-03-02 09:22:14 +02:00
Kefu Chai	1fe180ffbe	build: cmake: add expr_test test Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-02 14:26:55 +08:00
Kefu Chai	29dc4b0da5	build: cmake: allow test to specify the sources some tests are compiled from more source files, so add an extra parameter, so they can customize the sources. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-02 14:26:55 +08:00
Kefu Chai	78773c2ebd	build: cmake: add test/perf due to circular dependency: the .cc files under the root of project references the symbols defined by the source files under subdirectories, but the source files under subdirectories also reference the symbols defined by the .cc files under the root of project, the targets in test/perf do not compile. but the general structure is created. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-02 10:15:25 +08:00
Kefu Chai	a51c928e69	build: cmake: move lib sources into test-lib less convoluted this way, so each target only includes the sources in its own directory. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-02 10:15:25 +08:00
Kefu Chai	40fb6ff728	build: cmake: add missing source to utils Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-02 10:15:25 +08:00
Kefu Chai	074281c450	build: cmake: link against cryptopp since we include cryptopp/ headers, we need find it and link against it explicitly, instead of relying on seastar to do this. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-02 10:15:25 +08:00
Kefu Chai	167d018ca7	build: cmake: expose scylla_gen_build_dir should have exposed the base directory of genereted headers, not the one with "rust" component. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-02 10:15:25 +08:00
Kefu Chai	47a06e76a2	build: cmake: remove stale TODO comments they have been addressed already. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-02 10:15:25 +08:00
Kefu Chai	1e040e0e12	build: cmake: remove swagger_gen_files which has been moved into api/CMakeLists.txt Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-02 10:15:25 +08:00
Kefu Chai	563fbb2d11	build: cmake: extract more subsystem out into its own CMakeLists.txt namely, cdc, compaction, dht, gms, lang, locator, mutation_writer, raft, readers, replica, service, tools, tracing and transport. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-02 10:15:25 +08:00
Aleksandra Martyniuk	24edcd27d4	test: extend major keyspace compaction tasks test	2023-03-01 18:56:31 +01:00
Aleksandra Martyniuk	b188060535	compaction: create task manager's task for major keyspace compaction on one shard Implementation of task_manager's task that covers major keyspace compaction on one shard.	2023-03-01 18:56:26 +01:00
Tomasz Grabiec	2d935e255a	storage_service: node ops: Extract node_ops_insert() to reduce code duplication	2023-03-01 18:43:13 +01:00
Tomasz Grabiec	d5021d5a1b	storage_service: Make node operations safer by detecting asymmetric abort This patch fixes a problem which affects decommission and removenode which may lead to data consistency problems under conditions which lead one of the nodes to unliaterally decide to abort the node operation without the coordinator noticing. If this happens during streaming, the node operation coordinator would proceed to make a change in the gossiper, and only later dectect that one of the nodes aborted during sending of decommission_done or removenode_done command. That's too late, because the operation will be finalized by all the nodes once gossip propagates. It's unsafe to finalize the operation while another node aborted. The other node reverted to the old topolgy, with which they were running for some time, without considering the pending replica when handling requests. As a result, we may end up with consistency issues. Writes made by those coordinators may not be replicated to CL replicas in the new topology. Streaming may have missed to replicate those writes depending on timing. It's possible that some node aborts but streaming succeeds if the abort is not due to network problems, or if the network problems are transient and/or localized and affect only heartbeats. There is no way to revert after we commit the node operation to the gossiper, so it's ok to close node_ops sessions before making the change to the gossiper, and thus detect aborts and prevent later aborts after the change in the gossiper is made. This is already done during bootstrap (RBNO enabled) and replacenode. This patch canges removenode to also take this approach by moving sending of remove_done earlier. We cannot take this approach with decommission easily, because decommission_done command includes a wait for the node to leave the ring, which won't happen before the change to the gossiper is made. Separating this from decommission_done would require protocol changes. This patch adds a second-best solution, which is to check if sessions are still there right before making a change to the gossiper, leaving decommission_done where it was. The race can still happen, but the time window is now much smaller. Fixes #12989 Refs #12969	2023-03-01 18:43:13 +01:00
Kefu Chai	d85af3dca4	dht: incremental_owned_ranges_checker: use lower_bound() instead of using a while loop for finding the lower_bound, just use std::lower_bound() for finding if current node owns given token. this has two advantages: * better readability: as lower_bound is exactly what this loop calculates. * lower_bound uses binary search for searching the element, this algorithm should be faster than linear under most circumstances. * lower_bound uses std::advance() and prefix increment operator, this should be more performant than the postfix increment operator. as it does not create an temporary instance of iterator. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13008	2023-03-01 11:29:46 +02:00
Avi Kivity	3042deb930	types: reimplement in terms of a variable template data_type_for() is a function template that converts a C++ type to a database dynamic type (data_type object). Instead of implementing a function per type, implement a variable template instance. This is shorter and nicer. Since the original type variables (e.g. long_type) are defined separately, use a reference instead of copying to avoid initialization order problems. To catch misuses of data_type_for the general data_type_for_v variable template maps to some unused tag type which will cause a build error when instantiated. The original motivation for this was to allow for partial specialization of data_type_for() for tuple types, but this isn't really workable since the native type for tuples is std::vector<data_value>, not std::tuple, and I only checked this after getting the work done, so this isn't helping anything; it's just a little nicer. Closes #13043	2023-03-01 11:25:39 +02:00
Botond Dénes	d5dee43be7	Merge 'doc: specify the versions where Alternator TTL is no longer experimental' from Anna Stuchlik This PR adds a note to the Alternator TTL section to specify in which Open Source and Enterprise versions the feature was promoted from experimental to non-experimental. The challenge here is that OSS and Enterprise are (still) documented together, but they're not in sync in promoting the TTL feature: it's still experimental in 5.1 (released) but no longer experimental in 2022.2 (to be released soon). We can take one of the following approaches: a) Merge this PR with master and ask the 2022.2 users to refer to master. b) Merge this PR with master and then backport to branch-5.1. If we choose this approach, it is necessary to backport https://github.com/scylladb/scylladb/pull/11997 beforehand to avoid conflicts. I'd opt for a) because it makes more sense from the OSS perspective and helps us avoid mess and backporting. Closes #12295 * github.com:scylladb/scylladb: doc: fix the version in the comment on removing the note doc: specify the versions where Alternator TTL is no longer experimental	2023-03-01 11:24:52 +02:00
Botond Dénes	92fde47261	Merge 'test/cql-pytest - aggregation tests' from Nadav Har'El This small series reorganizes the existing functional tests for aggregation (min, max, count) and adds additional tests for sum reproducing the strange (but Cassandra-compatible) behavior described in issue #13027. Closes #13038 * github.com:scylladb/scylladb: cql-pytest: add tests for sum() aggregate test/cql-pytest: move aggregation tests to one file	2023-03-01 11:02:08 +02:00
Avi Kivity	6822e3b88a	query_id: extract into new header query_id currently lives query-request.hh, a busy place with lots of dependencies. In turn it gets pulled by uuid.idl.hh, which is also very central. This makes test/raft/randomized_nemesis_test.cc which is nominally only dependent on Raft rebuild on random header file changes. Fix by extracting into a new header. Closes #13042	2023-03-01 10:25:25 +02:00
Botond Dénes	46efdfa1a1	Merge 'readers/nonforwarding: don't emit partition_end on next_partition,fast_forward_to' from Gusev Petr The series fixes the `make_nonforwardable` reader, it shouldn't emit `partition_end` for previous partition after `next_partition()` and `fast_forward_to()` Fixes: #12249 Closes #12978 * github.com:scylladb/scylladb: flat_mutation_reader_test: cleanup, seastar::async -> SEASTAR_THREAD_TEST_CASE make_nonforwardable: test through run_mutation_source_tests make_nonforwardable: next_partition and fast_forward_to when single_partition is true make_forwardable: fix next_partition flat_mutation_reader_v2: drop forward_buffer_to nonforwardable reader: fix indentation nonforwardable reader: refactor, extract reset_partition nonforwardable reader: add more tests nonforwardable reader: no partition_end after fast_forward_to() nonforwardable reader: no partition_end after next_partition() nonforwardable reader: no partition_end for empty reader row_cache: pass partition_start though nonforwardable reader	2023-03-01 09:58:14 +02:00
Botond Dénes	1c0b47ee9b	Merge 'treewide: remove unused variable and reference used one explicitly' from Kefu Chai - treewide: do not define/capture unused variables - sstables/sstables: mark dummy variable for loop [[maybe_unused]] - util/result_try: reference this explicitly - raft: reference this explicitly - idl-compiler: mark captured this used - build: reenable unused-{variable,lambda-capture} warnings Closes #12915 * github.com:scylladb/scylladb: build: reenable unused-{variable,lambda-capture} warnings test: reader_concurrency_semaphore_test: define target_memory in debug mode api::failure_detector: mark set_phi_convict_threshold unimplemented test: memtable_test: mark dummy variable for loop [[maybe_unused]] idl-compiler: mark captured this used raft: reference this explicitly util/result_try: reference this explicitly sstables/sstables: mark dummy variable for loop [[maybe_unused]] treewide: do not define/capture unused variables service: storage_service: clear _node_ops in batch	2023-03-01 09:44:37 +02:00
Nadav Har'El	363f326d49	test/cql-pytest: test for CLUSTERING ORDER BY verification in MV Since commit `73e258fc34`, Scylla has partial verification for the CLUSTERING ORDER BY clause in CREATE MATERIALIZED VIEW. Specifically, invalid column names are rejected. But for reasons explained in issue #12936 and in the test in this patch, Cassandra demands that if CLUSTERING ORDER BY appears it must list all the clustering columns, with no duplicates, and do so in the right order. This patch replaces an existing test which suggested it is fine (an extention over Cassandra) to accept a partial list of clustering columns, by a test that verifies that such a partial list, or an incorrectly-ordered list, or list with duplicates, should be rejected. The new test fails on Scylla, and passes on Cassandra, so marked as xfail. Refs #12936. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12938	2023-03-01 08:02:39 +02:00
Botond Dénes	84e26ed9c3	Merge 'Enable RBNO by default' from Asias He This pr fixes the seastar::rpc::closed_error error in the test_topology suite and enables RBNO by default. Closes #12970 * github.com:scylladb/scylladb: Revert "Revert "storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops"" storage_service: Wait for normal state handler to finish in replace storage_service: Wait for normal state handler to finish in bootstrap	2023-03-01 07:55:46 +02:00
Nadav Har'El	7dc54771e1	test/cql-pytest: allow "run-cassandra" without building Scylla Before this patch, all scripts which use test/cql-pytest/run.py looked for the Scylla executable as their first step. This is usually the right thing to do, except in two cases where Scylla is not needed: 1. The script test/cql-pytest/run-cassandra. 2. The script test/alternator/run with the "--aws" option. So in this patch we change run.py to only look for Scylla when actually needed (the find_scylla() function is called). In both cases mentioned above, find_scylla() will never get called and the script can work even if Scylla was never built. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13010	2023-03-01 07:54:19 +02:00
Botond Dénes	eb10623dd2	Merge 'build: cmake: sync with `configure.py` (7/n)' from Kefu Chai - build: cmake: support thrift < 0.11.0 - build: cmake: find Thrift before using it - build: cmake: find libxcrypt before using it - build: cmake: expose scylla_gen_build_dir from "interface" - build: cmake: extract thrift out - build: cmake: link the whole auth - build: cmake: extract mutation,db,replica,streaming out Closes #12990 * github.com:scylladb/scylladb: build: cmake: extract mutation,db,replica,streaming out build: cmake: link the whole auth build: cmake: extract thrift out build: cmake: expose scylla_gen_build_dir from "interface" build: cmake: find libxcrypt before using it build: cmake: find Thrift before using it build: cmake: support thrift < 0.11.0	2023-03-01 07:35:21 +02:00
Kefu Chai	f59542a01a	build: reenable unused-{variable,lambda-capture} warnings now that all -Wunused-{variable,lambda-capture} warnings are taken care of. let's reenable these warnings so they can help us to identify potential issues. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-01 10:45:18 +08:00
Kefu Chai	efe96e7fc6	test: reader_concurrency_semaphore_test: define target_memory in debug mode otherwise we'd have following warning ``` test/boost/reader_concurrency_semaphore_test.cc:1380:20: error: unused variable 'target_memory' [-Werror,-Wunused-const-variable] constexpr uint64_t target_memory = uint64_t(1) << 28; // 256MB ^ 1 error generated.` ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-01 10:45:18 +08:00
Kefu Chai	ffffcdb48a	cql3: mark cf_name final as `cf_name` is not derived from any class, it's viable to mark it `final`. this change is created to to silence the warning from Clang, like: ``` /home/kefu/.local/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_LOCALE -DFMT_SHARED -DHAVE_LZ4_COMPRESS_DEFAULT -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=6 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/cmake/gen -I/home/kefu/dev/scylladb/build/cmake -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -Wall -Werror -Wno-mismatched-tags -Wno-missing-braces -Wno-c++11-narrowing -O0 -g -gz -std=gnu++20 -U_FORTIFY_SOURCE -DSEASTAR_SSTRING -Wno-error=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT CMakeFiles/scylla.dir/data_dictionary/data_dictionary.cc.o -MF CMakeFiles/scylla.dir/data_dictionary/data_dictionary.cc.o.d -o CMakeFiles/scylla.dir/data_dictionary/data_dictionary.cc.o -c /home/kefu/dev/scylladb/data_dictionary/data_dictionary.cc In file included from /home/kefu/dev/scylladb/data_dictionary/data_dictionary.cc:9: In file included from /home/kefu/dev/scylladb/data_dictionary/data_dictionary.hh:11: /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/optional:287:2: error: destructor called on non-final 'cql3::cf_name' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor] _M_payload._M_value.~_Stored_type(); ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/optional:318:4: note: in instantiation of member function 'std::_Optional_payload_base<cql3::cf_name>::_M_destroy' requested here _M_destroy(); ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/optional:439:57: note: in instantiation of member function 'std::_Optional_payload_base<cql3::cf_name>::_M_reset' requested here _GLIBCXX20_CONSTEXPR ~_Optional_payload() { this->_M_reset(); } ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/optional:514:17: note: in instantiation of member function 'std::_Optional_payload<cql3::cf_name>::~_Optional_payload' requested here constexpr _Optional_base() = default; ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/optional:739:17: note: in defaulted default constructor for 'std::_Optional_base<cql3::cf_name>' first required here constexpr optional(nullopt_t) noexcept { } ^ /home/kefu/dev/scylladb/cql3/statements/raw/batch_statement.hh:37:28: note: in instantiation of member function 'std::optional<cql3::cf_name>::optional' requested here : cf_statement(std::nullopt) ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/optional:287:23: note: qualify call to silence this warning _M_payload._M_value.~_Stored_type(); ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13039	2023-02-28 22:26:43 +02:00
Petr Gusev	1709a17c38	flat_mutation_reader_test: cleanup, seastar::async -> SEASTAR_THREAD_TEST_CASE	2023-02-28 23:42:44 +04:00
Petr Gusev	992ccb6255	make_nonforwardable: test through run_mutation_source_tests	2023-02-28 23:42:43 +04:00
Petr Gusev	989ef9d358	make_nonforwardable: next_partition and fast_forward_to when single_partition is true This flag designates that we should consume only one partition from the underlying reader. This means that attempts to move to another partition should cause an EOS.	2023-02-28 23:42:34 +04:00
Petr Gusev	a67776b750	make_forwardable: fix next_partition When next_partition is called, the buffer could contain partition_start and possibly static_row. In this case clear_buffer_to_next_partition will not remove anything from the buffer and the reader position should not change. Before this patch, however, we used to set _end_of_stream=false, which violated the forwardable-reader contract - the data of the next partition was emitted after the data of the first partition without intermediate EOS. This bug was found when debugging test_make_nonforwardable_from_mutations_as_mutation_source flakiness. A corresponding focused test_make_forwardable_next_partition has been added to exercise this problem.	2023-02-28 23:11:45 +04:00
Petr Gusev	64427b9164	flat_mutation_reader_v2: drop forward_buffer_to This is just a strange method I came across. It effectively does nothing but clear_buffer().	2023-02-28 23:00:02 +04:00
Petr Gusev	a517e1d6ad	nonforwardable reader: fix indentation	2023-02-28 23:00:02 +04:00
Petr Gusev	beeffb899f	nonforwardable reader: refactor, extract reset_partition No observable behaviour changes, just refactor the code.	2023-02-28 23:00:02 +04:00
Petr Gusev	023ed0ad00	nonforwardable reader: add more tests Add more test cases for completeness.	2023-02-28 23:00:02 +04:00
Petr Gusev	88cd1c3700	nonforwardable reader: no partition_end after fast_forward_to() This patch fixes the problem with method fast_forward_to which is similar to the one with next_partition, no partition_end should be injected for the partition if fast_forward_to was called inside it.	2023-02-28 23:00:02 +04:00
Petr Gusev	8ff96e1bce	nonforwardable reader: no partition_end after next_partition() Before the patch, nonforwardable reader injected partition_end unconditionally. This caused problems in case next_partition() was called, the downstream reader might have already injected its own partition_end marker, and the one from nonforwardable reader was a duplicate. Fixes: #12249	2023-02-28 23:00:02 +04:00
Petr Gusev	9c5c380b0b	nonforwardable reader: no partition_end for empty reader The patch introduces the _partition_is_open flag, inject partition_end only if there was some data in the input reader. A simple unit test has been added for the nonforwardable reader which checks this new behaviour.	2023-02-28 22:59:56 +04:00
Wojciech Mitros	6d2e785b5c	docs: update wasm.md The WASM UDF implementation has changed since the last time the docs were written. In particular, the Rust helper library has been released, and using it should be the recommended method. Some decisions that were only experimental at the start, were also "set in stone", so we should refer to them as such. The docs also contain some code examples. This patch adds tests for these examples to make sure that they are not wrong and misleading. Closes #12941	2023-02-28 20:59:25 +02:00
Kefu Chai	2434a4d345	utils: small_vector: define operator<=> small_vector should be feature-wise compatible with std::vector<>, let's add operator<=> for it. also, there is not needd to define operator!=() explicitly, C++20 define this for us if operator==() is defined, so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13032	2023-02-28 20:04:22 +02:00
Benny Halevy	06a0902708	dht/range_streamer: stream_async: move ranges_to_stream to do_streaming Currently the ranges_to_stream variable lives on the caller state, and do_streaming() moves its contents down to request_ranges/transfer_ranges and then calls clear() to make it ready for reuse. This works in principle but it makes it harder for an occasional reader of this code to figure out what going on. This change transfers control of the ranges_to_stream vector to do_streaming, by calling it with (std::exchange(do_streaming, {})) and with that that moved vector doesn't need to be cleared by do_streaming, and the caller is reponsible for readying the variable for reuse in its for loop. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 17:38:34 +02:00
Benny Halevy	1392c7e1cf	streaming: stream_session: maybe_yield To prevent reactor stalls when freeing many/long token range vectors. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 17:32:44 +02:00
Avi Kivity	20e1908c55	Merge 'treewide: use (defaulted) operator<=> when appropriate' from Kefu Chai - db/view: use operator<=> to define comparison operators - utils: UUID: use defaulted operator<=> - db: schema_tables: use defaulted operator<=> - cdc: generation: schema_tables: use defaulted operator<=> - db::commitlog::replay_position: use defaulted operator<=> Closes #13033 * github.com:scylladb/scylladb: db::commitlog::replay_position: use defaulted operator<=> cdc: generation: schema_tables: use defaulted operator<=> db: schema_tables: use defaulted operator<=> utils: UUID: use defaulted operator<=> db/view: use operator<=> to define comparison operators	2023-02-28 17:05:45 +02:00
Benny Halevy	c4836ab9e9	streaming: stream_session: prepare: move token ranges to add_transfer_ranges Reduce copies on the path to calling add_transfer_ranges. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 17:04:47 +02:00
Benny Halevy	12eb3d210f	streaming: stream_plan: transfer_ranges: move token ranges towards add_transfer_ranges Rather than copying the ranges vector. Note that add_transfer_ranges itself cannot simply move the ranges since it copies them for multiple tables. While at it, move also the keyspace and column_family strings. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 17:03:51 +02:00
Benny Halevy	775c6b9697	dht/range_streamer: stream_async: do_streaming: move ranges downstream The ranges can be moved rather than copied to both `request_ranges` and `transfer_ranges` as they are only cleared after this point. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 16:56:55 +02:00
Benny Halevy	3cd8838a09	dht/range_streamer: add_ranges: clear_gently ranges_for_keyspace After calling get_range_fetch_map, ranges_for_keyspace is not used anymore. Synchronously destroying it may potentially stall in large clusters so use utils::clear_gently to gently clear the map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 16:52:30 +02:00
Benny Halevy	a80c2d16dd	dht/range_streamer: get_range_fetch_map: reduce copies Use const& to refer to the input ranges and endpoints rather than copying them individually along the way more than needed to. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 16:52:30 +02:00
Benny Halevy	9d6e5d50d1	dht/range_streamer: add_ranges: move ranges down-stream Eliminate extraneous copy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 16:52:27 +02:00
Benny Halevy	c61f058aa5	dht/boot_strapper: move ranges to add_ranges Eliminate extraneous copy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 16:50:40 +02:00
Benny Halevy	27b382dcce	dht/range_streamer: stream_async: incrementally update _nr_ranges_remaining Rather than calling nr_ranges_to_stream() inside `do_streaming`. As nr_ranges_to_stream depends on the `_to_stream` that will be updated only later on after the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 16:50:40 +02:00
Benny Halevy	c3c7efffb1	dht/range_streamer: stream_async: erase from range_vec only after do_streaming success range_vec is used for calculating nr_ranges_to_stream. Currently, the ranges_to_stream that were moved out of range_vec are push back on exception, but this isn't safe, since they may have moved already to request_ranges or transfer_ranges. Instead, erase the ranges we pass to do_streaming only after it succeeds so on exception, range_vec will not need adjusting. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-28 16:50:40 +02:00
Kefu Chai	7de2d1c714	api::failure_detector: mark set_phi_convict_threshold unimplemented let it throw if "set_phi_convict_threshold" is called, as we never populate the specified \Phi. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:56:55 +08:00
Kefu Chai	60eac12db6	test: memtable_test: mark dummy variable for loop [[maybe_unused]] without C++23 `std::ranges::repeat_view`, it'd be cumbersume to implement a loop without dummy variable. this change helps to silence following warning: ``` test/boost/memtable_test.cc:1135:26: error: unused variable 'value' [-Werror,-Wunused-variable] for (int value : boost::irange<int>(0, num_flushes)) { ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:56:55 +08:00
Kefu Chai	2caf9b4e1c	idl-compiler: mark captured this used sometime the captured `this` is used in the generated C++ code, while some time it is not. to reenable `-Wunused-lambda-capture` warning, let's mark this `this` as used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:56:55 +08:00
Kefu Chai	b926105eae	raft: reference this explicitly Clang complains that the captured `this` is not used, like ``` /home/kefu/dev/scylladb/raft/fsm.hh:644:21: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture] auto visitor = [this, from, msg = std::move(msg)](const auto& state) mutable { ^ /home/kefu/dev/scylladb/raft/server.cc:738:11: note: in instantiation of function template specialization 'raft::fsm::step<raft::append_request>' requested here _fsm->step(from, std::move(append_request)); ^ ``` but `step(..)` is a non-static member function of `fsm`, so `this` is actually used. to silence Clang's warning, let's just reference it explicitly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:56:55 +08:00
Kefu Chai	5e7c8cc4b7	util/result_try: reference this explicitly quote from Avi's comment > It's supposed to be illegal to call handle(...) without this->, > because handle() is a dependent name (but many compilers don't > insist, gcc is stricter here). So two error messages competed, > and "unused this capture" won. without this change, Clang complains that `this` is not used with `-Wunused-lambda-capture`. in this change, `this` is used. in this change, `this` is explicitly referenced to silence Clang's warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:56:55 +08:00
Kefu Chai	1171c326a9	sstables/sstables: mark dummy variable for loop [[maybe_unused]] without C++23 `std::ranges::repeat_view`, it'd be cumbersume to implement a loop without dummy variable ``` /home/kefu/dev/scylladb/sstables/sstables.cc:484:15: error: unused variable '_' [-Werror,-Wunused-variable] for (auto _ : boost::irange<key_type>(0, nr_elements)) { ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:56:55 +08:00
Kefu Chai	3ae11de204	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:56:53 +08:00
Kefu Chai	be47874a42	service: storage_service: clear _node_ops in batch before this change, _node_ops are cleared one after another in `storage_service::node_ops_abort()` when `ops_uuid` is not specified. but this * is not efficient * is not quite readable * introduces an unused variable so, in this change, we just clear it in batch. this should silence a `-Wno-unused-variable` warning from Clang. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:52:25 +08:00
Nadav Har'El	130c090251	cql-pytest: add tests for sum() aggregate This patch adds regression tests for the strange (but Cassandra-compatible) behavior described in issue #13027 - that sum of no results returns 0 (not null or nothing), and if also asking for p, we get a null there too. Refs #13027. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-02-28 15:35:21 +02:00
Botond Dénes	6b72f4a6fa	Merge 'main: display descriptions of all tools' from Kefu Chai - main: expose tools as a vector<> - main: use a struct for representing tool - main: track tools descriptin in tool struct - main: add missing descriptions for tools - main: move get_tools() into main() Fixes #13026 Closes #13030 * github.com:scylladb/scylladb: main: move get_tools() into main() main: add missing descriptions for tools main: track tools descriptin in tool struct main: use a struct for representing tool main: expose tools as a vector<>	2023-02-28 15:32:11 +02:00
Kefu Chai	af3968bf6e	build: cmake: extract mutation,db,replica,streaming out Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:28:46 +08:00
Kefu Chai	6f3a44cde9	build: cmake: link the whole auth without this change, linker would like to remove the .o which is not referenced by auther translation units. but we do use static variables to, for instance, register classess to a global registry. so, let's force the linker to include the whole archive. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:28:46 +08:00
Kefu Chai	3e75df6917	build: cmake: extract thrift out also, move "interface" linkage from scylla to "thrift", because it is "thrift" who is using "interface". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:28:46 +08:00
Kefu Chai	4bb0134f1d	build: cmake: expose scylla_gen_build_dir from "interface" as it builds headers like "gen/Cassandra.h", and the target uses "interface" via these headers, so "interface" is obliged to expose this include directory. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:28:46 +08:00
Kefu Chai	1aafeac023	build: cmake: find libxcrypt before using it we should find libxcrypt library before using it. in this change, Findlibxcrypt.cmake is added to find libxcrypt library. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:28:46 +08:00
Kefu Chai	607858db51	build: cmake: find Thrift before using it we should find Thrift library before using it. in this change, FindThrift.cmake is added to find Thrift library. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:28:46 +08:00
Kefu Chai	f30e7f9da1	build: cmake: support thrift < 0.11.0 define THRIFT_USES_BOOST if thrift < 0.11.0, see also #4538 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 21:28:46 +08:00
Nadav Har'El	e1f97715eb	test/cql-pytest: move aggregation tests to one file We had separate test files test_minmax.py and test_count.py but the separate was artificial (and test_count.py even had one test using min()). Now I that want to add another test for sum(), I don't know where to put it. So in this patch I combine test_minmax.py and test_count.py into one test file - test_aggregate.py, and we can later add sum() tests in the same file. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-02-28 14:39:04 +02:00
Kefu Chai	67b334385c	dist/redhat: specify version in `Obsoletes:` to silence the warning from rpmbuild, like ``` RPM build warnings: line 202: It's not recommended to have unversioned Obsoletes: Obsoletes: tuned ``` more specific this way. quote from the commit message of `303865d979` for the version number: > tuned 2.11.0-9 and later writes to kerned.sched_wakeup_granularity_ns > and other sysctl tunables that we so laboriously tuned, dropping > performance by a factor of 5 (due to increased latency). Fix by > obsoleting tuned during install (in effect, we are a better tuned, > at least for us). with this change, it'd be easier to identify potential issues when building / packaging. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12721	2023-02-28 13:55:04 +02:00
Marcin Maliszkiewicz	bd7caefccf	docs: link general repairs page to RBNO page Information was duplicated before and the version on this page was outdated - RBNO is enabled for replace operation already. Closes #12984	2023-02-28 13:04:32 +02:00
Tomasz Grabiec	fddd93da4e	storage_service: node ops: Add error injections	2023-02-28 11:32:18 +01:00
Tomasz Grabiec	5c8ad2db3c	service: node_ops: Make watchdog and heartbeat intervals configurable Will be useful for writing tests which trigger failures, and for warkarounds in production.	2023-02-28 11:31:55 +01:00
Kefu Chai	5bf6e9ba97	db::commitlog::replay_position: use defaulted operator<=> the default generated operator<=> is exactly the same as the handcrafted one. so let compiler do its job. also, since operator<=> is defaulted, there is no need to define operator== anymore, so drop it as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 17:25:30 +08:00
Kefu Chai	aed681fa3c	cdc: generation: schema_tables: use defaulted operator<=> the default generated operator<=> is exactly the same as the handcrafted one. so let compiler do its job. also, since operator<=> is defaulted, there is no need to define operator== anymore, so drop it as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 17:25:30 +08:00
Kefu Chai	56c9c9d29e	db: schema_tables: use defaulted operator<=> the default generated operator<=> is exactly the same as the handcrafted one. so let compiler do its job. also, since operator<=> is defaulted, there is no need to define operator== anymore, so drop it as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 17:25:30 +08:00
Kefu Chai	9ec8b4844b	utils: UUID: use defaulted operator<=> the default generated operator<=> is exactly the same as the handcrafted one. so let compiler do its job. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 17:25:30 +08:00
Kefu Chai	ab5d772d63	db/view: use operator<=> to define comparison operators also, there is no need to define operator!=() if operator==() is defined, so drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 17:25:30 +08:00
Kefu Chai	7550be1fc6	main: move get_tools() into main() there is not need to have a dedicated function which is only consumed by `main()`. so let's move the body of `get_tools()` into `main`. and with this change, a plain C array would suffice. so just use a plain array for tools. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 17:09:46 +08:00
Kefu Chai	128dbebb76	main: add missing descriptions for tools Fixes #13026 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 17:09:46 +08:00
Kefu Chai	ef0dfeb2fa	main: track tools descriptin in tool struct so we can manage the tools in a more structured way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 17:09:46 +08:00
Kefu Chai	ffbbd59486	main: use a struct for representing tool so we can encapsulate the description of a certain tool in this struct with a more readable field name in comparison with a tuple<>, if we want to track all tools in this vector. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 17:09:46 +08:00
Kefu Chai	73cf62469b	main: expose tools as a vector<> so, in addition to looking up a tool by the name in it, we will be able to list all tools in this vector. this change paves the road to a more general solution to handle `--list-tools`. in this change * `lookup_main_func()` is replaced by `get_tools()`. * instead of checking `main_func` out of the if block, check it in the `if` block. as we already know if we have a matched tool in the `if` block, and we can early return right there. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 17:09:46 +08:00
Kefu Chai	991379bdb3	raft: broadcast_tables: remove unused asyncio mark test_broadcast_kv_store does not use await or yield at all, so there is no need to mark it with "asyncio" mark. tested using ``` SCYLLA_HOME=$HOME/scylla build/cmake/scylla --overprovisioned --developer-mode=yes --consistent-cluster-management=true --experimental-features=broadcast-tables ... pytest broadcast_tables/test_broadcast_tables.py ``` the test still passes. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13006	2023-02-28 11:05:15 +02:00
Asias He	8fb786997a	Revert "Revert "storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops"" This reverts commit `fd4ee4878a`.	2023-02-28 09:00:13 +08:00
Asias He	5856e69462	storage_service: Wait for normal state handler to finish in replace Similar to "storage_service: Wait for normal state handler to finish in bootstrap", this patch enables the check on the replace procedure.	2023-02-28 09:00:13 +08:00
Asias He	53636167ca	storage_service: Wait for normal state handler to finish in bootstrap In storage_service::handle_state_normal, storage_service::notify_joined will be called which drops the rpc connections to the node becomes normal. This causes rpc calls with that node fail with seastar::rpc::closed_error error. Consider this: - n1 in the cluster - n2 is added to join the cluster - n2 sees n1 is in normal status - n2 starts bootstrap process - notify_joined on n2 closes rpc connection to n1 in the middle of bootstrap - n2 fails to bootstrap For example, during bootstrap with RBNO, we saw repair failed in a test that sets ring_delay to zero and does not wait for gossip to settle. repair - repair[9cd0dbf8-4bca-48fc-9b1c-d9e80d0313a2]: sync data for keyspace=system_distributed_everywhere, status=failed: std::runtime_error ({shard 0: seastar::rpc::closed_error (connection is closed)}) This patch fixes the race by waiting for the handle_state_normal handler to finish before the bootstrap process. Fixes #12764 Fixes #12956	2023-02-28 09:00:13 +08:00
Kefu Chai	b6e4275511	configure.py: build and use libseastar.so in debug and dev modes now that Seastar can be built as shared libraries, we can use it for faster development iteration with less disk usage. in this change * configure.py: - 'build_seastar_shared_libs' is added as yet another mode value, so different modes have its own setting. 'debug' and 'dev' have this enabled, while other modes disable it. - link scylla with rpath specified, so it can find `libseastar.so` in build directory. * install.sh: remove the rpath as the rpath in the elf image will not be available after the relocatable package is installed, also rpmbuild will error out when it uses check-rpaths to verify the elf images (executables and shared libraries), as the rpath encoded in them are not known ones. patchelf() will take care of the shared libraries linked by the executables. so we don't need to worry about libseastar.so or libseastar_testing.so. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12801	2023-02-27 21:08:34 +02:00
Kefu Chai	4f3bc915a6	cql-pytest: remove duplicated words in README.md Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13005	2023-02-27 17:28:32 +02:00
Nadav Har'El	3b32440993	test/cql-pytest: add regression test for UNSET key in insert Recently, we overhauled the error handling of UNSET_VALUE in various places where it is not allowed. This patch adds two more regression tests for this error handling. Both tests pass on Scylla today, pass on Cassandra, but fail on earlier Scylla (e.g., I tested 5.1.5): The first test does INSERT into clustering key UNSET_VALUE. An UNSET_VALUE is designed to skip part of the write - not an entire write - so this attempt should fail - not silently be skipped. The write indeed fails with an error on Cassandra, and on recent Scylla, but silently did nothing in older Scylla which leads this test to fail there. The second test does the same thing with LWT (adding an "IF NOT EXISTS") added to the insert. Scylla's failure here was even more spectacular - it crashed (as reported in issue #13001) instead of silently skipping the right. The test passes on Scylla today and on Cassandra, which both report the failure cleanly. Refs #13001. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13007	2023-02-27 17:20:22 +02:00
Petr Gusev	a46df5af63	row_cache: pass partition_start though nonforwardable reader Now the nonforwardable reader unconditionally produces a partition_end, even if the input reader was empty. This is strange in itself, but it also hinders to properly fix its next_partition() method, which is our ultimate goal. So we are going to change this and produce partition_end only if there were some data in the stream. However, this makes a problem: now we pop partition_start from the underlying reader in autoupdating_underlying_reader::move_to_next_partition and manually push it back to downstream readers bypassing nonforwardable reader. This means if we change the logic in nonforwardable reader as described we will end up with partition_start without partition_end in the downstream readers. This patch rectifies this by making sure that nonforwardable will see the initial partition_start. We inject this partition_start just before the nonforwardable reader, into delegating_reader. This also makes the result type of range_populating_reader::operator() a bit simpler, we don't need to pass partition_start anymore.	2023-02-27 18:46:31 +04:00
Nadav Har'El	73e258fc34	materialized views: verify CLUSTERING ORDER BY clause Cassandra is very strict in the CLUSTERING ORDER BY clause which it allows when creating a materialized view - if it appears, it must list all the clustering columns of the view. Scylla is less strict - a subset of the clustering columns may be specified. But Scylla was too lenient - a user could specify non-clustering columns and even non-existent columns and Scylla would not fail the MV creation. This patch fixes that - with it MV creation fails if anything besides clustering columns are listed on CLUSTERING ORDER BY. An xfailing test we had for this case no longer fails after this patch so its xfail mark is removed. We also add a few more corner cases to the tests. This patch also fixs one C++ test which had exactly the error that this patch detects - the test author tried to use the partition key, instead of the clustering key, in CLUSTERING ORDER BY (this error had no effect because the specified order, "asc", was the default anyway). Fixes #10767 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12885	2023-02-27 15:09:42 +02:00
Kefu Chai	7fd303044e	tools/schema_loader: drop unused functions `load_one_schema()` and `load_schemas_from_file()` are dropped, as they are neither used by `scylla-sstable` or tested by `schema_loader_test.cc` . the latter tests `load_schemas()`, which is quite the same as `load_one_schema_from_file()`, but is more permissive in the sense that it allows zero schema or more than one schema in the specified path. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13003	2023-02-27 13:03:05 +02:00
Avi Kivity	6f88dc8009	Merge 'Fix memory leaks caused by throwing reader_concurrency_semaphore::consume()' from Botond Dénes Said method can now throw `std::bad_alloc` since `aab5954`. All call-sites should have been adapted in the series introducing the throw, but some managed to slip through because the oom unit test didn't run in debug mode. This series fixes the remaining unpatched call-sites and makes sure the test runs in debug mode too, so leaks like this are detected. Fixes: #12767 Closes #12756 * github.com:scylladb/scylladb: test/boost/reader_concurreny_semaphore_test: run oom protection tests in debug mode treewide: adapt to throwing reader_concurrency_semaphore::consume()	2023-02-27 12:27:30 +02:00
Anna Stuchlik	91b611209f	doc: fixes https://github.com/scylladb/scylladb/issues/12954 , adds the minimal version from which the 2021.1-to-2022.1 upgrade is supported for Ubuntu, Debian, and image Closes #12974	2023-02-27 12:15:49 +02:00
David Garcia	20bff2bd10	docs: Update ScyllaDB Enterprise link Closes #12985	2023-02-27 08:39:50 +02:00
Anna Stuchlik	95ce2e8980	doc: fix the option name LWT_OPTIMIZATION_META_BIT_MASK Fixes #12940. Closes #12982 [avi: move fixes tag out of subject]	2023-02-26 19:51:20 +02:00
Avi Kivity	c863186dc5	Merge 'Fixes for docs/dev/building.md' from Kamil Braun Closes #12071 * github.com:scylladb/scylladb: docs/dev: building.md: mention node-exporter packages docs/dev: building.md: replace `dev` with `<mode>` in list of debs	2023-02-26 19:27:33 +02:00
Kefu Chai	410035f03d	abstract_replication_strategy: remove unnecessary `virtual` specifier `effective_replication_map` is not a base class of any other class. so there is no need to mark any of its member function as `virtual`. this change should address following waring from Clang: ``` /home/kefu/dev/scylladb/seastar/include/seastar/core/shared_ptr.hh:205:9: error: delete called on non-final 'locator::effective_replication_map' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor] delete value_ptr; ^ /home/kefu/dev/scylladb/seastar/include/seastar/core/shared_ptr.hh:202:9: note: in instantiation of member function 'seastar::internal::lw_shared_ptr_accessors_esft<locator::effective_replication_map>::dispose' requested here dispose(static_cast<T*>(counter)); ^ /home/kefu/dev/scylladb/seastar/include/seastar/core/shared_ptr.hh:317:27: note: in instantiation of member function 'seastar::internal::lw_shared_ptr_accessors_esft<locator::effective_replication_map>::dispose' requested here accessors<T>::dispose(_p); ^ /home/kefu/dev/scylladb/locator/abstract_replication_strategy.hh:263:12: note: in instantiation of member function 'seastar::lw_shared_ptr<locator::effective_replication_map>::~lw_shared_ptr' requested here return make_lw_shared<effective_replication_map>(std::move(rs), std::move(tmptr), std::move(replication_map), replication_factor); ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12992	2023-02-26 19:16:28 +02:00
Kefu Chai	79d2eb1607	cql3: functions: validate arguments for 'token()' also since "token()" computes the token for a given partition key, if we pass the key of the wrong type, it should reject. in this change, * we validate the keys before returning the "token()" function. * drop the "xfail" decorator from two of the tests. they pass now after this fix. * change the tests which previously passed the wrong number of arguments containing null to "token()" and expect it to return null, so they verify that "token()" should reject these arguments with the expected error message. Fixes #10448 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12991	2023-02-26 19:01:58 +02:00
Gleb Natapov	1ce7ad1ee6	lwt: do not destroy capture in upgrade_if_needed lambda since the lambda is used more then once If on the first call the capture is destroyed the second call may crash. Fixes: #12958 Message-Id: <Y/sks73Sb35F+PsC@scylladb.com>	2023-02-26 16:13:16 +02:00
Kefu Chai	f3e6c9168c	sstables: generation_type: define fmt::formatter for generation_type turns out what we need is a fmt::formatter<sstables::generation_type> not operator<<(ostream&, sstables::generation_type), as its only use case is the formatter used by seastar::format(). to specialize fmt::formatter<sstables::generation_type> * allows us to be one step closer to drop `FMT_DEPRECATED_OSTREAM` * allows us to customize the way how generation_type is printed by customizing the format specifier. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12983	2023-02-26 15:38:10 +02:00
Avi Kivity	8a0a784131	Merge 'utils: UUID: use default generated comparison operators' from Kefu Chai - utils: UUID: define operator<=> for UUID - utils: UUID: define operator==() only Closes #12981 * github.com:scylladb/scylladb: utils: UUID: define operator==() only utils: UUID: define operator<=> for UUID	2023-02-26 15:31:46 +02:00
Piotr Smaroń	c1760af26c	cql3: adding missing privileged on cache size eviction metric Fixes #10463 Closes #12865	2023-02-26 14:33:46 +02:00
Kefu Chai	1c71151eda	utils: UUID: define operator==() only as, in C++20, compiler is able to generate the operator==() for us, and the default generated one is identical to what we have now. also, in C++20, operator!=() is generated by compiler if operator==() is defined, so we can dispense with the former. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-25 09:36:11 +08:00
Kefu Chai	300e0b1d1c	utils: UUID: define operator<=> for UUID instead of the family of comparison operators, just define <=>. as in C++20, compiler will define all six comparison operators for us. in this change, the operator<=> is defined, so we can more compacted code. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-25 09:36:11 +08:00
Asias He	ba919aa88a	storage_service: Send heartbeat earlier for node ops Node ops has the following procedure: 1 for node in sync_nodes send prepare cmd to node 2 for node in sync_nodes send heartbeat cmd to node If any of the prepare cmd in step 1 takes longer than the heartbeat watchdog timeout, the heartbeat in step 2 will be too late to update the watchdog, as a result the watchdog will abort the operation. To prevent slow prepare cmd kills the node operations, we can start the heartbeat earlier in the procedure. Fixes #11011 Fixes #12969 Closes #12980	2023-02-24 22:31:40 +01:00
Botond Dénes	61e67b865a	Merge 'service:forward_service: use long type instead of counter in function mocking' from Michał Jadwiszczak Aggregation query on counter column is failing because forward_service is looking for function with counter as an argument and such function doesn't exist. Instead the long type should be used. Fixes: #12939 Closes #12963 * github.com:scylladb/scylladb: test:boost: counter column parallelized aggregation test service:forward_service: use long type when column is counter	2023-02-24 15:25:10 +02:00
Raphael S. Carvalho	d73ffe7220	sstables: Temporarily disable loading of first and last position metadata It's known that reading large cells in reverse cause large allocations. Source: https://github.com/scylladb/scylladb/issues/11642 The loading is preliminary work for splitting large partitions into fragments composing a run and then be able to later read such a run in an efficiency way using the position metadata. The splitting is not turned on yet, anywhere. Therefore, we can temporarily disable the loading, as a way to avoid regressions in stable versions. Large allocations can cause stalls due to foreground memory eviction kicking in. The default values for position metadata say that first and last position include all clustering rows, but they aren't used anywhere other than by sstable_run to determine if a run is disjoint at clustering level, but given that no splitting is done yet, it does not really matter. Unit tests relying on position metadata were adjusted to enable the loading, such that they can still pass. Fixes #11642. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12979	2023-02-24 12:14:18 +02:00
Michał Jadwiszczak	4c6675bf1a	test:boost: counter column parallelized aggregation test	2023-02-24 10:24:23 +01:00
Michał Jadwiszczak	68d2e1fff8	service:forward_service: use long type when column is counter Previously aggregations on counter columns were failing because function mocking was looking for function with counter arguemnt, which doesn't exist.	2023-02-24 10:24:16 +01:00
Botond Dénes	be232ff024	Merge 'Shard of shard repair task impl' from Aleksandra Martyniuk Shard id is logged twice in repair (once explicitly, once added by logger). Redundant occurrence is deleted. shard_repair_task_impl::id (which contains global repair shard) is renamed to avoid further confusion. Fixes: #12955 Closes #12959 * github.com:scylladb/scylladb: repair: rename shard_repair_task_impl::id repair: delete redundant shard id from logs	2023-02-24 08:43:54 +02:00
Botond Dénes	80f653d65e	Merge 'Major keyspace compaction task' from Aleksandra Martyniuk Task manager task implementation that covers the major keyspace compaction which can be start through /storage_service/keyspace_compaction/ api. Closes #12661 * github.com:scylladb/scylladb: test: add test for major keyspace compaction tasks compaction: create task manager's task for major keyspace compaction compaction: copy run_on_existing_tables to task_manager_module.cc compaction: add major_compaction_task_impl compacition: add pure virtual compaction_task_impl compaction: add compaction module getter to compaction manager	2023-02-24 07:08:06 +02:00
Guy Shtub	c47b7c4cb2	Replacing user-group with community forum, added link to U. lesson on Spring Boot Fixed author/email details Closes #12748	2023-02-23 19:05:26 +02:00
Aleksandra Martyniuk	e9f01c7cce	test: add test for major keyspace compaction tasks	2023-02-23 15:48:25 +01:00
Aleksandra Martyniuk	159e603ac4	compaction: create task manager's task for major keyspace compaction Implementation of task_manager's task covering major keyspace compaction that can be started through storage_service api.	2023-02-23 15:48:05 +01:00
Aleksandra Martyniuk	6b1d7f5979	compaction: copy run_on_existing_tables to task_manager_module.cc Copy run_on_existing_tables from api/storage_service.cc to compaction/task_manager_module.cc	2023-02-23 15:31:59 +01:00
Anna Stuchlik	4dd1659d0b	doc: fixes https://github.com/scylladb/scylladb/issues/12964 , removes the information that the CDC options are experimental Closes #12973	2023-02-23 15:06:53 +02:00
Kefu Chai	412953fdd5	compress, transport: do not detect LZ4_compress_default() `LZ4_compress_default()` was introduced in liblz4 v1.7.3, despite that the release note (https://github.com/lz4/lz4/releases/tag/v1.7.3) of v1.7.3 didn't mention this. if we check the commit which added this API, we can find all releases including it: see ``` $ git tag --contains 1b17bf2ab8cf66dd2b740eca376e2d46f7ad7041 lz4-r130 r129 r130 r131 rc129v0 v1.7.3 v1.7.4 v1.7.4.2 v1.7.5 v1.8.0 v1.8.1 v1.8.1.2 v1.8.2 v1.8.3 v1.9.0 v1.9.1 v1.9.2 v1.9.3 v1.9.4 ``` and v1.7.3 was released in Nov 17, 2016. some popular distros releases also package new enough liblz4: - fedora 35 ships lz4-devel 1.9.3, - CentOS 7 ships lz4-devel 1.8.3 - debian 10 ships liblz4-dev 1.8.3 - ubuntu 18.04 ships liblz4-dev r131 so, in this change, we drop the support of liblz4 < 1.7.3 for better code readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12971	2023-02-23 14:39:20 +02:00
Pavel Emelyanov	0959739216	sstables: Remove always-false sstable_writer_config::leave_unsealed It was used in sstables streaming code up until `e5be3352` (database, streaming, messaging: drop streaming memtables) or nearby, then the whole feature was reworked. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12967	2023-02-23 12:50:06 +01:00
Botond Dénes	624d176b3b	Merge 'Refine usage of sstable_test_env::reusable_sst() method' from Pavel Emelyanov Some test cases can be made a bit more compact by using the sugar provided by the aforementioned sugar Closes #12965 * github.com:scylladb/scylladb: test: Make use of reusable_sst default format tests: Use reusable_sst() where applicable	2023-02-23 12:50:06 +01:00
Botond Dénes	a5979c0662	Merge 'treewide: remove invalid defaulted move ctor' from Kefu Chai - test/boost/chunked_vector_test: remove defaulted exception_safe_class's move ctor - tools/scylla-sstable: remove defaulted move ctor - sstables/mx/partition_reversing_data_source: remove defaulted move ctor - cql3/statements/truncate_statement: remove defaulted move ctor Closes #12914 * github.com:scylladb/scylladb: test/boost/chunked_vector_test: remove defaulted exception_safe_class's move ctor tools/scylla-sstable: remove defaulted move ctor sstables/mx/partition_reversing_data_source: remove defaulted move ctor cql3/statements/truncate_statement: remove defaulted move ctor	2023-02-23 12:50:05 +01:00
Avi Kivity	665429d85b	cql3: remove assignment_testable::test_all Was replaced with cql3::expr::test_assignment_all(). Closes #12951	2023-02-23 12:50:05 +01:00
Botond Dénes	0c756af137	Merge 'build: cmake: sync with `configure.py` (6/n)' from Kefu Chai - build: cmake: correct linker flags - build: cmake: enable boost tests only if BUILD_TESTING - build: cmake: reuse test-lib library - build: cmake: extract redis out Closes #12961 * github.com:scylladb/scylladb: build: cmake: extract interface out build: cmake: extract redis out build: cmake: reuse test-lib library build: cmake: enable boost tests only if BUILD_TESTING build: cmake: correct linker flags	2023-02-23 12:50:05 +01:00
Aleksandra Martyniuk	d889a599e8	repair: rename shard_repair_task_impl::id shard_repair_task_impl::id stores global repair id. To avoid confusion with the task id, the field is renamed to global_repair_id.	2023-02-23 11:29:00 +01:00
Aleksandra Martyniuk	f7c88edec5	repair: delete redundant shard id from logs In repair shard id is logged twice. Delete repeated occurence.	2023-02-23 11:25:18 +01:00
Pavel Emelyanov	5b311bb724	test: Make use of reusable_sst default format The sstable_test_env::reusable_sst() has default value for the format argument. Patch the test cases that don't use one while at it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-22 17:04:10 +03:00
Pavel Emelyanov	7aabffff19	tests: Use reusable_sst() where applicable The reusable_sst() is intented to be used to load the pre-existing sstable from the test/resources directory and .load() them. Some test cases, however, still do it "by hand". Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-22 17:03:15 +03:00
Kefu Chai	5b3fd57c25	build: cmake: extract interface out Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-22 18:35:11 +08:00
Kefu Chai	64879fb6f7	build: cmake: extract redis out and move `redis/protocol_parser.rl` related rules into `redis`, as it is a file used for the implementation of redis. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-22 18:35:11 +08:00
Kefu Chai	43d9055b89	build: cmake: reuse test-lib library it already includes the necessary bits used by test-perf, so let's just link the latter to the former. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-22 18:35:11 +08:00
Kefu Chai	d07b649791	build: cmake: enable boost tests only if BUILD_TESTING BUILD_TESTING is an option exposed by CTest module, so let's include CTest module, and check if BUILD_TESTING is enabled before include boost based tests. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-22 18:35:11 +08:00
Kefu Chai	59698cc495	build: cmake: correct linker flags s/sha/sha1/. turns out `867b58c62c` failed to include the latest change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-22 18:35:11 +08:00
Aleksandra Martyniuk	b908369e85	compaction: add major_compaction_task_impl All major compaction tasks will share some methods like type or abort. The common part of the tasks should be inherited from major_compaction_task_impl.	2023-02-22 09:52:04 +01:00
Aleksandra Martyniuk	be101078a0	compacition: add pure virtual compaction_task_impl Add compaction_task_impl that is a pure virtual class from which all compaction tasks implementations will inherit.	2023-02-22 09:51:57 +01:00
Pavel Emelyanov	f51762c72a	headers: Refine view_update_generator.hh and around The initial intent was to reduce the fanout of shared_sstable.hh through v.u.g.hh -> cql_test_env.hh chain, but it also resulted in some shots around v.u.g.hh -> database.hh inclusion. By and large: - v.u.g.hh doesn't need database.hh - cql_test_env.hh doesn't need v.u.g.hh (and thus -- the shared_sstable.hh) but needs database.hh instead - few other .cc files need v.u.g.hh directly as they pulled it via cql_test_env.hh before - add forward declarations in few other places Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12952	2023-02-22 09:32:30 +02:00
Botond Dénes	e183dc4345	Merge 'Wrap sstable directory scan state in components_lister' from Pavel Emelyanov The sstable_directory now combines two activities: * scans the list of files in /var/lib/data and generates sstable-s object from it * maintains the found sstable-s throughout necessary processing (populate/reshard/reshape) The former part is in fact storage-specific. If sstables are on a filesystem, then it should be scanned with listdir, there can be dangling files, like temp-TOC, pending deletion log and comonents not belonging to any TOCs. If sstables are on some other storage, then this part should work some other way. Said that, the sstable_directory is to be split into two pieces -- lister and "processing state". The latter would (may?) require renaming the sstable_directory into something more relevant, but that's huge and intrusive change. For now, just collect the lister stuff in one place. Closes #12843 * github.com:scylladb/scylladb: sstable_directory: Keep lister internals private sstable_directory: Move most of .commit_directory_changes() on lister sstable_directory: Remove temporary aliases sstable_directory: Move most of .process_sstable_dir() on lister sstable_directory: Move .handle_component() to components_lister sstable_directory: Keep files_for_removal on scan_state sstable_directory: Keep components_lister aboard sstable_directory: Keep scan_state on components_lister	2023-02-22 08:10:04 +02:00
Calle Wilund	97881091d3	commitlog: Fix updating of total_size_on_disk on segment alloc when o_dsync is off Fixes #12810 We did not update total_size_on_disk in commitlog totals when use o_dsync was off. This means we essentially ran with no registered footprint, also causing broken comparisons in delete_segments.	2023-02-21 16:35:23 +00:00
Calle Wilund	64102780fe	commitlog: Use static (reused) regex for (left over) descriptor parse Refs #11710 Allows reusing regex for segment matching (for opening left-over segments after crash). Should remove any stalls caused by commitlog replay preparation. v2: Add unit test for descriptor parsing Closes #12112	2023-02-21 18:34:04 +02:00
Botond Dénes	ef548e654d	types: unserialize_value for multiprecision_int,bool: don't read uninitialized memory Check the first fragment before dereferencing it, the fragment might be empty, in which case move to the next one. Found by running range scan tests with random schema and random data. Fixes: #12821 Fixes: #12823 Fixes: #12708 Closes #12824	2023-02-21 17:39:18 +02:00
Tomasz Grabiec	c8e2bf1596	db: schema_tables: Optimize schema merge Currently, applying a schema change on a replica works like this: Collect all affected keyspaces from incoming mutations Read current state of schema Apply the mutations Read new state of schema The "Read ... state of schema" step reads all kinds of schema objects. In particular, to read the "table" objects, it does the following: for every affected keyspace k: read all mutations from system_schema.tables for k extract all existing table names from those mutations for every existing table: read mutations from {tables, columns, indexes, view_virtual_columns, ...} for that table As you can see, the number of reads performed is O(nr tables in a keyspace), not O(nr tables in a change). This means that making a sequence of schema changes, like adding a table, is quadratic. Another aspect which magnifies this is that we don't read those tables using a single scan, but issue individual queries for each table separately. This patch optimizes this by considering only affected tables when reading schema for the purpose of diff calculation. When mutations contain multi-table deletions, we still read the set of tables, like before. This could be optimized by looking at the database to get the list, but it's not part of the patch. I tested this using a test case provided by Kamil (kbr-scylla@53fe154) ./test.py --mode debug test_many_schema_changes -s The test bootstraps a cluster and then creates about 40 schema changes. Then a new node is bootstrapped and replays those changes via group0. In debug mode, each change takes roughly 2s to process before the patch, and 0.5s after the patch. The whole replay is reduced to 56% of what was before: Before (1m19s) : INFO 2023-01-20 19:44:35,848 [shard 0] raft_group0 - setup_group0: ensuring that the cluster has fully upgraded to use Raft... INFO 2023-01-20 19:45:54,844 [shard 0] raft_group0 - setup_group0: waiting for peers to synchronize state... After (45s): INFO 2023-01-20 22:02:51,869 [shard 0] raft_group0 - setup_group0: ensuring that the cluster has fully upgraded to use Raft... INFO 2023-01-20 22:03:36,834 [shard 0] raft_group0 - setup_group0: waiting for peers to synchronize state... Closes #12592 Closes #12592	2023-02-21 17:26:57 +02:00
Calle Wilund	6f972ee68b	commitlog: change type of stored size known_size() is technically not a size_t.	2023-02-21 15:26:02 +00:00
Pavel Emelyanov	abab4d446d	sstable: Remove explicit quarantization call Now all callers are patched to use new change_state() call, so it can be removed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-21 17:44:55 +03:00
Pavel Emelyanov	bbf192e775	test: Move move_to_new_dir() method from sstable class There's a bunch of test cases that check how moving sstables files around the filesystem works. These need the generic move_to_new_dir() method from sstable, so move it there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-21 17:42:18 +03:00
Pavel Emelyanov	bb0140531e	sstable, dist.-loader: Introduce and use pick_up_from_upload() method When "uploading" an sstable scylla uses a short-cut -- the sstable's files are to be put into upload/ subdir by the caller, then scylla just pulls them in in the cheapest way possible -- by relinking the files. When this happens sstable also changes its generation, which is the only place where this happens at all. For object storage uploading is not going to be _that_ simple, so for now add an fs-specific method to pick up an sstable from upload dir with the intent to generalize it (if possible) when object-storage uploading appears. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-21 17:40:00 +03:00
Pavel Emelyanov	8a061bd862	sstables, code: Introduce and use change_state() call The call moves the sstable to the specified state. The change state is translated into the storage driver state change which is for todays filesystem storage means moving between directories. The "normal" state maps to the base dir of the table, there's no dedicated subdir for this state and this brings some trouble into the play. The thing is that in order to check if an sstable is in "normal" state already its impossible to compare filename of its path to any pre-defined values, as tables' basdirs are dynamic. To overcome this, the change-state call checks that the sstable is in one of "known" sub-states, and assumes that it's in normal state otherwise. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-21 17:39:34 +03:00
Pavel Emelyanov	e67751ee92	distributed_loader: Let make_sstables_available choose target directory When sstables are loaded from upload/ subdir, the final step is to move them from this directory into base or staging one. The uploading code evaluates the target directory, then pushes it down the stack towards make_sstables_available() method. This patch replaces the path argument with bool to_staging one. The goal is to remove the knowlege of exact sstable location (nowadays -- its files' path) from the distributed loader and keep it in sstable object itself. Next patches will make full use of this change. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-21 17:23:59 +03:00
Botond Dénes	763fe54637	Merge 'build: cmake: sync with `configure.py` (5/n) ' from Kefu Chai - build: cmake: build release.cc as a library - build: cmake: link alternator against cql3 - build: cmake: link scylla against xxHash::xxhash - build: cmake: use lld or gold as linker if available Closes #12942 * github.com:scylladb/scylladb: build: cmake: use lld or gold as linker if available build: cmake: link scylla against xxHash::xxhash build: cmake: link alternator against cql3 build: cmake: build release.cc as a library	2023-02-21 16:19:24 +02:00
Pavel Emelyanov	41d65daa29	sstables: Remove dangling ready future from .close_files() Was left unnoticed while `7c7eb81a` ('Encapsulate filesystem access by sstable into filesystem_storage subsclass') Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12946	2023-02-21 15:47:55 +02:00
Pavel Emelyanov	398f7704dc	sstable_directory: Keep lister internals private Now the lister procides two-calls API to the user -- process and commit. The rest can and should be marked as private. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-21 16:44:50 +03:00
Pavel Emelyanov	e6941d0baa	sstable_directory: Move most of .commit_directory_changes() on lister Committing any changes made while scanning the storage is storage-specific. Just like .process() was moved on lister, the .commit() now does the same. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-21 16:44:49 +03:00
Pavel Emelyanov	70d6bfc109	sstable_directory: Remove temporary aliases Previous patches created a bunch of local aliases-references in components_lister::process(). This patch just removes those aliases, no functional changes are made here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-21 16:42:24 +03:00
Pavel Emelyanov	c4037270a3	sstable_directory: Move most of .process_sstable_dir() on lister Processing storage with sstable files/objects is storage-specific. The components_lister is the right components to handle it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-21 16:42:24 +03:00
Pavel Emelyanov	4c4aeba9b6	sstable_directory: Move .handle_component() to components_lister This method is in charge of collecting a found file on scan_state, it logically belogs to the components_lister and its internals. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-21 16:42:24 +03:00
Pavel Emelyanov	58f4076117	sstable_directory: Keep files_for_removal on scan_state This list is the list of on-disk files, which is the property of filesystem scan state. When committing directory changes (read: removing those files) the list can be moved-from the state. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-21 16:42:23 +03:00
Pavel Emelyanov	df5384cb1e	sstable_directory: Keep components_lister aboard The lister is supposed to be alive throughout .process_sstable_dir() and can die after .commit_directory_changes(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-21 16:32:06 +03:00
Pavel Emelyanov	5d98e34c16	sstable_directory: Keep scan_state on components_lister The scan_state keeps the state of listing directory with sstables. It now lives on the .process_sstable_dir() stack, but it can as well live on the lister itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-21 16:32:06 +03:00
Kamil Braun	318f1f64c2	docs: update pygments dependency version Closes #12949	2023-02-21 13:06:39 +02:00
Botond Dénes	372ac57c96	Merge 'doc: remove the incorrect information about IPs from the Restore page' from Anna Stuchlik Fixes https://github.com/scylladb/scylladb/issues/12945 This PR removes the incorrect information and updates the link to the relevant page in the Manager docs. Closes #12947 * github.com:scylladb/scylladb: doc: update the link to the Restore page in the ScyllaDB Manager documentation doc: remove the wrong info about IPs from the note on the Restore page	2023-02-21 12:30:31 +02:00
Kamil Braun	d56c060b4e	Merge 'various raft fixes' from Gleb Natapov The series fixes a race in case of a leader change while add_entry_on_leader is sleeping and an abort during raft shutdown. * '12863-fix-v1' of github.com:scylladb/scylla-dev: raft: abort applier fiber when a state machine aborts raft: fix race in add_entry_on_leader that may cause incorrect log length accounting	2023-02-21 10:57:04 +01:00
Anna Stuchlik	d743146313	doc: update the link to the Restore page in the ScyllaDB Manager documentation	2023-02-21 10:30:02 +01:00
Anna Stuchlik	1e85df776f	doc: remove the wrong info about IPs from the note on the Restore page	2023-02-21 10:24:06 +01:00
Pavel Emelyanov	3f88d3af62	Merge 'test_shed_too_large_request fix: disable compression' from Gusev Petr The test relies on exact request size, this doesn't work if compression is applied. The driver enables compression only if both the server and the client agree on the codec to use. If compression package (e.g. lz4) is not installed, the compression is not used. The trick with locally_supported_compressions is needed since I couldn't find any standard means to disable compression other than the compression flag on the cluster object, which seemed too broad. fixes: #12836 Closes #12854 * github.com:scylladb/scylladb: test_shed_too_large_request: clarify the comments test_shed_too_large_request: use smaller test string test_shed_too_large_request fix: disable compression	2023-02-21 10:35:59 +03:00
Kefu Chai	867b58c62c	build: cmake: use lld or gold as linker if available Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-21 14:24:18 +08:00
Kefu Chai	69b1e7651e	build: cmake: link scylla against xxHash::xxhash instead of adding `XXH_PRIVATE_API` to compile definitions, link scylla against xxHash::xxhash, which provides this definition for us. also move the comment on `XXH_PRIVATE_API` into `FindxxHash.cmake`, where this definition is added. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-21 14:24:18 +08:00
Kefu Chai	0fffd34be8	build: cmake: link alternator against cql3 otherwise we'd have ``` In file included from /home/kefu/dev/scylladb/alternator/executor.cc:37: /home/kefu/dev/scylladb/cql3/util.hh:21:10: fatal error: 'cql3/CqlParser.hpp' file not found ^~~~~~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-21 14:24:18 +08:00
Kefu Chai	957403663f	build: cmake: build release.cc as a library so we can attach compiling definitions in a simpler way. this change is based on Botond Dénes's change which gives an overhaul to the existing CMake building system. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-21 14:23:04 +08:00
Botond Dénes	d7b6cf045f	Merge 'build: cmake: sync with `configure.py` (4/n)' from Kefu Chai - build: cmake: link cql3 against wasmtime_bindings - build: cmake: output rust binding headers in expected dir - build: cmake: link auth against cql3 Closes #12927 * github.com:scylladb/scylladb: build: cmake: link auth against cql3 build: cmake: output rust binding headers in expected dir build: cmake: link cql3 against wasmtime_bindings	2023-02-20 12:46:15 +01:00
Botond Dénes	3c30531202	Merge 'test: mutation_test: Fix sporadic failure due to continuity mismatch' from Tomasz Grabiec In test_v2_apply_monotonically_is_monotonic_on_alloc_failures we generate mutations with non-full continuity, so we should pass is_evictable::yes to apply_monotonically(). Otherwise, it will assume fully-continuous versions and not try to maintain continuity by inserting sentinels. This manifested in sporadic failures on continuity check. Fixes #12882 Closes #12921 * github.com:scylladb/scylladb: test: mutation_test: Fix sporadic failure due to continuity mismatch test: mutation_test: Fix copy-paste mistake in trace-level logging	2023-02-20 12:46:15 +01:00
Pavel Emelyanov	273999b9fa	sstable: Mark version and format members const These two are indeed immutable throughout the object lifetime Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12918	2023-02-20 12:46:15 +01:00
Kefu Chai	adbcc3db8f	dist/debian: drop unused Makefile variable this change was previously reverted by `cbc005c6f5` . it turns out this change was but the offending change. so let's resurrect it. `job` was introduced back in `782ebcece4`, so we could consume the option specified in DEB_BUILD_OPTIONS environmental variable. but now that we always repackage the artifacts prebuilt in the relocatable package. we don't build them anymore when packaging debian packages. see `9388f3d626` . and `job` is not passed to `ninja` anymore. so, in this change, `job` is removed from debian/rules as well, as it is not used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12924	2023-02-20 12:46:15 +01:00
Nadav Har'El	328cdb2124	cql-pytest: translate Cassandra's tests for compact tables This is a translation of Cassandra's CQL unit test source file validation/operations/CompactStorageTest.java into our cql-pytest framework. This very large test file includes 86 tests for various types of operations and corner cases of WITH COMPACT STORAGE tables. All 86 tests pass on Cassandra (except one using a deprecated feature that needs to be specially enabled). 30 of the tests fail on Scylla reproducing 7 already-known Scylla issues and 7 previously-unknown issues: Already known issues: Refs #3882: Support "ALTER TABLE DROP COMPACT STORAGE" Refs #4244: Add support for mixing token, multi- and single-column restrictions Refs #5361: LIMIT doesn't work when using GROUP BY Refs #5362: LIMIT is not doing it right when using GROUP BY Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY Refs #7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax Refs #8627: Cleanly reject updates with indexed values where value > 64k New issues: Refs #12471: Range deletions on COMPACT STORAGE is not supported Refs #12474: DELETE prints misleading error message suggesting ALLOW FILTERING would work Refs #12477: Combination of COUNT with GROUP BY is different from Cassandra in case of no matches Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column Refs #12526: Support filtering on COMPACT tables Refs #12749: Unsupported empty clustering key in COMPACT table Refs #12815: Hidden column "value" in compact table isn't completely hidden Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12816	2023-02-20 12:46:15 +01:00
Raphael S. Carvalho	fbeee8b65d	Optimize load-and-stream load-and-stream implements no policy when deciding which SSTables will go in each streaming round (batch of 16 SSTables), meaning the choice is random. It can take advantage of the fact that the LSM-tree layout, with ICS and LCS, is a set of SSTable runs, where each run is composed of SSTables that are disjoint in their key range. By sorting SSTables to be streamed by their first key, the effect is that SSTable runs will be incrementally streamed (in token order). SSTable runs in the same replica group (or in the same node) will have their content deduplicated, reducing significantly the amount of data we need to put on the wire. The improvement is proportional to the space amplification in the table, which again, depends on the compaction strategy used. Another important benefit is that the destination nodes will receive SSTables in token order, allowing off-strategy compaction to be more efficient. This is how I tested it: 1) Generated a 5GB dataset to a ICS table. 2) Started a fresh 2-node cluster. RF=2. 3) Ran load-and-stream against one of the replicas. BEFORE: $ time curl -X POST "http://127.0.0.1:10000/storage_service/sstables/keyspace1?cf=standard1&load_and_stream=true" real 4m40.613s user 0m0.005s sys 0m0.007s AFTER: $ time curl -X POST "http://127.0.0.1:10000/storage_service/sstables/keyspace1?cf=standard1&load_and_stream=true" real 2m39.271s user 0m0.005s sys 0m0.004s That's ~1.76x faster. That's explained by deduplication: BEFORE: INFO 2023-02-17 22:59:01,100 [shard 0] stream_session - [Stream #79d3ce7a-ea47-4b6e-9214-930610a18ccd] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3445376, received_partitions=2755835 INFO 2023-02-17 22:59:41,491 [shard 0] stream_session - [Stream #bc6bad99-4438-4e1e-92db-b2cb394039c8] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3308288, received_partitions=2836491 INFO 2023-02-17 23:00:20,585 [shard 0] stream_session - [Stream #e95c4f49-0a2f-47ea-b41f-d900dd87ead5] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3129088, received_partitions=2734029 INFO 2023-02-17 23:00:49,297 [shard 0] stream_session - [Stream #255cba95-a099-4fec-a72c-f87d5cac2b1d] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=2544128, received_partitions=1959370 INFO 2023-02-17 23:01:33,110 [shard 0] stream_session - [Stream #96b5737e-30c7-4af8-a8b8-96fecbcbcbd0] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3624576, received_partitions=3085681 INFO 2023-02-17 23:02:20,909 [shard 0] stream_session - [Stream #3185a48b-fb9e-4190-88f4-5c7a386bc9bd] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3505024, received_partitions=3079345 INFO 2023-02-17 23:03:02,039 [shard 0] stream_session - [Stream #0d2964dc-d5e3-4775-825c-97f736d14713] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=2808192, received_partitions=2655811 AFTER: INFO 2023-02-17 23:12:49,155 [shard 0] stream_session - [Stream #bf00963c-3334-4035-b1a9-4b3ceb7a188a] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=2965376, received_partitions=1006535 INFO 2023-02-17 23:13:13,365 [shard 0] stream_session - [Stream #1cd2e3ac-a68b-4cb5-8a06-707e91cf59db] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3543936, received_partitions=1406157 INFO 2023-02-17 23:13:37,474 [shard 0] stream_session - [Stream #5a278230-6b4b-461f-8396-c15df7092d03] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3639936, received_partitions=1371298 INFO 2023-02-17 23:14:02,132 [shard 0] stream_session - [Stream #19f40dc3-e02a-4321-a917-a6590d99dd03] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3638912, received_partitions=1435386 INFO 2023-02-17 23:14:26,673 [shard 0] stream_session - [Stream #d47507eb-2067-4e8f-a4f7-c82d5fbd4228] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3561600, received_partitions=1423024 INFO 2023-02-17 23:14:49,307 [shard 0] stream_session - [Stream #d42ee911-253a-4de6-ac89-6a3c05b88d66] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=2382592, received_partitions=1452656 INFO 2023-02-17 23:15:10,067 [shard 0] stream_session - [Stream #1f78c1bf-8e20-41bd-95de-16de3fc5f86c] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=2632320, received_partitions=1252298 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20230219191924.37070-1-raphaelsc@scylladb.com>	2023-02-20 12:46:14 +01:00
guy9	917e085919	Update manager-monitoring-integration.rst Changing default manager from 56090 to 5090 @amnonh please review @annastuchlik please change if other locations in Docs require this change Closes #12682	2023-02-20 12:46:14 +01:00
Avi Kivity	6d5c242651	Update tools/java submodule (hdrhistogram failure with Java 11) * tools/java f0bab7af66...ab0a613fdc (1): > Fix cassandra-stress -log hdrfile=... with java 11	2023-02-20 12:46:14 +01:00
Aleksandra Martyniuk	4f67c0c36a	compaction: add compaction module getter to compaction manager	2023-02-20 11:19:29 +01:00
Kefu Chai	df63e2ba27	types: move types.{cc,hh} into types they are part of the CQL type system, and are "closer" to types. let's move them into "types" directory. the building systems are updated accordingly. the source files referencing `types.hh` were updated using following command: ``` find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} + ``` the source files under sstables include "types.hh", which is indeed the one located under "sstables", so include "sstables/types.hh" instea, so it's more explicit. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12926	2023-02-19 21:05:45 +02:00
Tzach Livyatan	f97a23a9e3	Add a warnining: altering a service level timeout doesn't affect existing connections Closes #12928 Refs #12923	2023-02-19 14:49:23 +02:00
Kefu Chai	ee97c332d9	test/boost/chunked_vector_test: remove defaulted exception_safe_class's move ctor because it has a member variable whose type is a reference. and a reference cannot be reassigned. this silences following warning from Clang: ``` /home/kefu/dev/scylladb/test/boost/chunked_vector_test.cc:152:27: error: explicitly defaulted move assignment operator is implicitly deleted [-Werror,-Wdefaulted-function-deleted] exception_safe_class& operator=(exception_safe_class&&) = default; ^ /home/kefu/dev/scylladb/test/boost/chunked_vector_test.cc:132:31: note: move assignment operator of 'exception_safe_class' is implicitly deleted because field '_esc' is of reference type 'exception_safety_checker &' exception_safety_checker& _esc; ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-19 12:58:22 +08:00
Kefu Chai	2bb61b8c18	tools/scylla-sstable: remove defaulted move ctor ``` /home/kefu/dev/scylladb/tools/scylla-sstable.cc:2301:9: error: explicitly defaulted move constructor is implicitly deleted [-Werror,-Wdefaulted-function-deleted] impl(impl&&) = default; ^ /home/kefu/dev/scylladb/tools/scylla-sstable.cc:2291:16: note: move constructor of 'impl' is implicitly deleted because field '_reader' has an inaccessible move constructor reader _reader; ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-19 12:57:40 +08:00
Kefu Chai	cca9b7c4cd	sstables/mx/partition_reversing_data_source: remove defaulted move ctor as partition_reversing_data_source_impl has indirectly a member variable which a member of reference type. this should addres following warning from Clang: ``` /home/kefu/dev/scylladb/sstables/mx/partition_reversing_data_source.cc:476:43: error: explicitly defaulted move assignment operator is implicitly deleted [-Werror,-Wdefaulted-function-deleted] partition_reversing_data_source_impl& operator=(partition_reversing_data_source_impl&&) noexcept = default; ^ /home/kefu/dev/scylladb/sstables/mx/partition_reversing_data_source.cc:365:19: note: move assignment operator of 'partition_reversing_data_source_impl' is implicitly deleted because field '_schema' is of reference type 'const schema &' const schema& _schema; ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-19 12:57:40 +08:00
Kefu Chai	958f8bf79f	cql3/statements/truncate_statement: remove defaulted move ctor ``` /home/kefu/dev/scylladb/cql3/statements/truncate_statement.hh:29:5: error: explicitly defaulted move constructor is implicitly deleted [-Werror,-Wdefaulted-function-deleted] truncate_statement(truncate_statement&&) = default; ^ /home/kefu/dev/scylladb/cql3/statements/truncate_statement.hh:25:39: note: move constructor of 'truncate_statement' is implicitly deleted because field '_attrs' has a deleted move constructor const std::unique_ptr<attributes> _attrs; ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:523:7: note: 'unique_ptr' has been explicitly marked deleted here unique_ptr(const unique_ptr&) = delete; ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-19 12:57:40 +08:00
Kefu Chai	6803f38a7a	build: cmake: link auth against cql3 as auth headers references cql3 ``` In file included from /home/kefu/dev/scylladb/auth/authenticator.cc:16: In file included from /home/kefu/dev/scylladb/cql3/query_processor.hh:24: /home/kefu/dev/scylladb/lang/wasm_instance_cache.hh:20:10: fatal error: 'rust/cxx.h' file not found ^~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-19 12:46:51 +08:00
Kefu Chai	a2668f8ba8	build: cmake: output rust binding headers in expected dir we include rust binding headers like `rust/wasmtime_bindings.hh`. so they should be located in directory named "rust". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-19 12:46:51 +08:00
Kefu Chai	494ed41a54	build: cmake: link cql3 against wasmtime_bindings as it references headers provided by wasmtime_bindings: ``` In file included from /home/kefu/dev/scylladb/cql3/functions/user_function.cc:9: In file included from /home/kefu/dev/scylladb/cql3/functions/user_function.hh:16: /home/kefu/dev/scylladb/lang/wasm.hh:14:10: fatal error: 'rust/wasmtime_bindings.hh' file not found ^~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-19 12:46:51 +08:00
Gleb Natapov	941407b905	database: fix do_apply_many() to handle empty array of mutations Currently the code will assert because cl pointer will be null and it will be null because there is no mutations to initialize it from. Message-Id: <20230212144837.2276080-3-gleb@scylladb.com>	2023-02-17 22:58:22 +01:00
Yaron Kaikov	a4e08ee48a	Revert "dist/debian: bump up debhelper compatibility level to 10" This reverts commit `75eaee040b`. Since it's causing a regression preventing from Scylla service to start in deb OS Fixes: #12738 Closes #12897	2023-02-17 17:34:12 +02:00
Michał Chojnowski	e88f590eda	sstables: partition_index_cache: clean up an unused type alias `list_ptr` is a type alias that isn't used in any meaningful way. Remove it. Closes #10978	2023-02-17 17:58:26 +03:00
Tomasz Grabiec	2ae8f74cec	test: mutation_test: Fix sporadic failure due to continuity mismatch In test_v2_apply_monotonically_is_monotonic_on_alloc_failures we generate mutations with non-full continuity, so we should pass is_evictable::yes to apply_monotonically(). Otherwise, it will assume fully-continuous versions and not try to maintain continuity by inserting sentinels. This manifested in sporadic failures on continuity check. Fixes #12882	2023-02-17 14:43:32 +01:00
Tomasz Grabiec	22063713d7	test: mutation_test: Fix copy-paste mistake in trace-level logging	2023-02-17 14:42:47 +01:00
Botond Dénes	f62e62f151	Merge 'build: cmake: sync with `configure.py` (3/n)' from Kefu Chai * build: cmake: add test * build: cmake: expose the bridged rust library * build: cmake: correct library path * build: cmake: add missing source files * build: cmake: put generated sources into ${scylla_gen_build_dir} * build: cmake: silence -Wuninitialized warning * build: cmake: extract idl library out * build: cmake: ignore -Wparentheses-equality Closes #12893 * github.com:scylladb/scylladb: build: cmake: add unit tests build: cmake: extract sstables out build: cmake: extract auth and schema build: utils: extract utils out build: cmake: link Boost::regex with ICU::i18n build: cmake: add test build: cmake: expose the bridged rust library build: cmake: correct library path build: cmake: add missing source files build: cmake: put generated sources into ${scylla_gen_build_dir} build: cmake: silence -Wuninitialized warning build: cmake: extract idl library out build: cmake: ignore -Wparentheses-equality	2023-02-17 13:13:01 +02:00
Kefu Chai	05ecc3f1c9	build: cmake: add unit tests this change is based on Botond Dénes's change which gave an overhaul to the original CMake building system. this change is not enough to build tests with CMake, as we still need to sort out the dependencies. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-17 18:41:40 +08:00
Kefu Chai	f76a169025	build: cmake: extract sstables out Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-17 18:41:40 +08:00
Kefu Chai	f3714f1706	build: cmake: extract auth and schema Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-17 18:41:40 +08:00
Kefu Chai	3e481c9d15	build: utils: extract utils out Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-17 18:41:39 +08:00
Kefu Chai	4d7ae07e9e	build: cmake: link Boost::regex with ICU::i18n it turns out Boost::regex references ICU::i18n, but it fails to bring the linkage to its public interface. so let's do this on behalf of it. ``` : && /home/kefu/.local/bin/clang++ -Wall -Werror -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-missing-braces -Wno-overloaded-virtual -Wno-parentheses-equality -Wno-unsupported-friend -march=westmere -O0 -g -gz CMakeFiles/scylla.dir/absl-flat_hash_map.cc.o CMakeFiles/$ ld.lld: error: undefined symbol: icu_67::Collator::createInstance(icu_67::Locale const&, UErrorCode&) >>> referenced by icu.hpp:56 (/usr/include/boost/regex/icu.hpp:56) >>> CMakeFiles/scylla.dir/utils/like_matcher.cc.o:(boost::re_detail_107500::icu_regex_traits_implementation::icu_regex_traits_implementation(icu_67::Locale const&)) >>> referenced by icu.hpp:61 (/usr/include/boost/regex/icu.hpp:61) >>> CMakeFiles/scylla.dir/utils/like_matcher.cc.o:(boost::re_detail_107500::icu_regex_traits_implementation::icu_regex_traits_implementation(icu_67::Locale const&)) ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-17 18:39:44 +08:00
Kefu Chai	02de9f1833	build: cmake: add test Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-17 18:39:44 +08:00
Kefu Chai	f5750859f7	build: cmake: expose the bridged rust library so that scylla can be linked against it when it is linked with wasmtime_bindings. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-17 18:39:44 +08:00
Kefu Chai	7569424d86	build: cmake: correct library path it encodes the profile in it. so, in this change, the used profile is added in the path. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-17 18:39:44 +08:00
Kefu Chai	affebc35be	build: cmake: add missing source files Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-17 18:39:43 +08:00
Kefu Chai	c0824c6c25	build: cmake: put generated sources into ${scylla_gen_build_dir} to be aligned with the convention of configure.py Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-17 18:38:44 +08:00
Kefu Chai	db8a2c15fa	build: cmake: silence -Wuninitialized warning Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-17 18:38:44 +08:00
Kefu Chai	7b431748a8	build: cmake: extract idl library out Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-17 18:38:44 +08:00
Kefu Chai	d89602c6a2	build: cmake: ignore -Wparentheses-equality antlr3 generates code like `((foo == bar))`. but Clang does not like it. let's disable this warning. and explore other options later. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-17 18:38:44 +08:00
Avi Kivity	7fc7cbd3bf	build: nix: switch to non-static zstd When we added zstd (`f14e6e73bb`), we used the static library as we used some experimental APIs. However, now the dynamic library works, so apparently the experimenal API is now standard. Switch to the dynamic library. It doesn't improve anything, but it aligns with how we do things. Closes #12902	2023-02-17 10:29:34 +02:00
Avi Kivity	ae3489382e	build: nix: update clang Clang 15 is now packaged by Nix, so use it. Closes #12901	2023-02-17 10:26:44 +02:00
Kefu Chai	50f68fe475	test/perf: do not brace interger with {} `int_range::make_singular()` accepts a single `int` as its parameter, so there is no need to brace the paramter with `{}`. this helps to silence the warning from Clang, like: ``` /home/kefu/dev/scylladb/test/perf/perf_fast_forward.cc:1396:63: error: braces around scalar initializer [-Werror,-Wbraced-scalar-init] check_no_disk_reads(test(int_range::make_singular({100}))), ^~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12903	2023-02-17 10:24:24 +02:00
Botond Dénes	2b1f10a41c	Merge 'doc: add a KB about the new tombstones compaction process in ICS' from Anna Stuchlik Fixes https://github.com/scylladb/scylla-docs/issues/4140 This PR adds a new Knowledge Base article about improved garbage collection in ICS. It's based on the document created by @raphaelsc https://docs.google.com/document/d/1fA7uBcN9tgxeHwCbWftPJz071dlhucoOYO1-KJeOG8I/edit?usp=sharing. @raphaelsc Could you review it? I've made some improvements to the language and text organization, but I didn't add or remove any content, so it should be a quick review. @tzach requested a diagram, but we can add it later. It would be great to have this content published asap. Closes #12792 * github.com:scylladb/scylladb: doc: add the new KB to the list of topics doc: add a new KB article about timbstone garbage collection in ICS	2023-02-17 10:20:01 +02:00
Aleksandra Martyniuk	5d826f13e7	api: move get_and_update_ttl to task manager api Task ttl can be set with task manager test api, which is disabled in release mode. Move get_and_update_ttl from task manager test api to task manager api, so that it can be used in release mode. Closes #12894	2023-02-17 10:19:06 +02:00
Piotr Smaroń	d2bfe124ad	doc: fix command invoking tests The developer documentation from `building.md` suggested to run unit tests with `./tools/toolchain/dbuild test` command, however this command only invokes `test` bash tool, which immediately returns with status `1`: ``` [piotrs@new-host scylladb]$ ./tools/toolchain/dbuild test [piotrs@new-host scylladb]$ echo $? 1 ``` This was probably unintended mistake and what author really meant was invoking `dbuild ninja test`. Closes #12890	2023-02-17 10:16:33 +02:00
Botond Dénes	0961a3f79b	test/boost/reader_concurreny_semaphore_test: run oom protection tests in debug mode Said tests require on being run with a limited amount of memory to be really useful. When the memory amount is unexpected, they silently exit. Which is exactly what they did in debug mode too, where the amount of memory available cannot be controlled. Disable the check in debug mode.	2023-02-17 00:46:56 -05:00
Botond Dénes	1a9fdebb49	treewide: adapt to throwing reader_concurrency_semaphore::consume() Said method can now throw `std::bad_alloc` since `aab5954`. All call-sites should have been adapted in the series introducing the throw, but some managed to slip through because the oom unit test didn't run in debug mode. In this commit the remaining unpatched call-sites are fixed.	2023-02-17 00:46:56 -05:00
Avi Kivity	e2f6e0b848	utils: move hashing related files to utils/ module Closes #12884	2023-02-17 07:19:52 +02:00
Kefu Chai	2f0cb9e68f	db/virtual_table: mark the dtor of base class `virtual` as `my_result_collector` has virtual function, and its dtor is not marked virtual, Clang complains. let's mark its base class virtual to be on the safe side. ``` /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:100:2: error: delete called on non-final 'my_result_collector' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor] delete __ptr; ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:405:4: note: in instantiation of member function 'std::default_delete<my_result_collector>::operator()' requested here get_deleter()(std::move(__ptr)); ^ /home/kefu/dev/scylladb/db/virtual_table.cc:134:25: note: in instantiation of member function 'std::unique_ptr<my_result_collector>::~unique_ptr' requested here auto consumer = std::make_unique<my_result_collector>(s, permit, &pr, std::move(reader_and_handle.second)); ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12879	2023-02-17 07:11:18 +02:00
Botond Dénes	79bf347e04	Merge 'Remove sstables::test_setup in favor of sstables::test_env' from Pavel Emelyanov The former is a convenience wrapper over the latter. There's no real benefit in using it, but having two test_env-s is worse than just one. Closes #12794 * github.com:scylladb/scylladb: sstable_utils: Move the test_setup to perf/ sstable_utils: Remove unused wrappers over test_env sstable_test: Open-code do_with_cloned_tmp_directory sstable_test: Asynchronize statistics_rewrite case tests: Replace test_setup::do_with_tmp_directory with test_env::do_with(_async)?	2023-02-17 07:09:34 +02:00
Anna Stuchlik	bcca706ff5	doc: fixes https://github.com/scylladb/scylladb/issues/12754 , document the metric update in 5.2 Closes #12891	2023-02-16 19:05:48 +02:00
Nadav Har'El	02682aa40d	test/cql-pytest: add reproducer for ALLOW FILTERING bug This patch adds a reproducer for the bug described in issue #7964 - The restriction `where k=1 and c=2` (when k,c are the key columns) returns (at most) a single row so doesn't need ALLOW FILTERING, but if we add a third restriction, say `v=2`, this still processes at most a single row so doesn't need ALLOW FILTERING - and both Scylla and Cassandra get it wrong - so it's marked with both xfail and cassandra_bug. The patch also adds another test that for longer partition slices, e.g., `where k=1 and c>2`, although the slice itself doesn't need filtering, if we add `v=2` here we suddenly do need ALLOW FILTERING, because the slice itself may be a large number of rows, and adding `v=2` may restrict it to just a few results. This test passes on both Scylla and Cassandra. Issue #7964 mentioned these scenarios and even had some example code, but we never added it to the test suite, so we finally do it now. Refs #7964 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12850	2023-02-16 19:05:48 +02:00
Botond Dénes	dc3d47b1e4	Merge 'Get compaction history without using qctx' from Pavel Emelyanov There are two methods to mess with compaction history -- update and get. The former had been patched to use local system-keyspace instance by `907fd2d3` (system_keyspace: De-static compaction history update) now it's time for the latter (spoiler: it's only used by the API handler) Closes #12889 * github.com:scylladb/scylladb: system_keyspace; Make get_compaction_history non static and drop qctx api, compaction_manager: Get compaction history via manager system_keyspace: Move compaction_history_entry to namespace scope	2023-02-16 19:05:48 +02:00
Anna Stuchlik	826f67a298	doc: related https://github.com/scylladb/scylladb/issues/12658 , fix the service name in the upgrade guide from 2022.1 to 2022.2 Closes #12698	2023-02-16 19:05:48 +02:00
Botond Dénes	87f7ac920e	Merge 'Add task manager utils for tests' from Aleksandra Martyniuk Tests of each module that is integrated with task manager use calls to task manager api. Boilerplate to call, check status, and get result may be reduced using functions. task_manager_utils.py contains wrappers for task manager api calls and helpers that may be reused by different tests. Closes #12844 * github.com:scylladb/scylladb: test: use functions from task_manager_utils.py in test_task_manager.py test: add task_manager_utils.py	2023-02-16 19:05:48 +02:00
Kefu Chai	fcdea9f950	test/perf: mark output_writer::~output_writer() as virtual as an abstract base class `output_writer` is inherited by both `json_output_writer` and `text_output_writer`. and `output_manager` manages the lifecycles of used writers using `std::unique_ptr<output_writer>`. before this change, the dtor of `output_writer` is not marked as virtual, so when its dtor is invoked, what gets called is the base class's dtor. but the dtor of `json_output_writer` is non-trivial in the sense that this class is aggregated by a bunch of member variables. if we don't invoke its dtor when destroying this object, leakage is expected. so, in this change, the dtor of `output_writer` is marked as virtual, this makes all of its derived classes' dtor virtual. and the right dtor is always called. test/perf is only designed for testing, and not used in production, also, this feature was recently integrated into scylla executable in `228ccdc1c7`. so there is no need to backport this change. change should also silence the warning from Clang 17: ``` /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:100:2: error: delete called on 'output_writer' that is abstract but has non-virtual destructor [-Werror,-Wdelete-abstract-non-virtual-dtor] delete __ptr; ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:405:4: note: in instantiation of member function 'std::default_delete<output_writer>::operator()' requested here get_deleter()(std::move(__ptr)); ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/stl_construct.h:88:15: note: in instantiation of member function 'std::unique_ptr<output_writer>::~unique_ptr' requested here __location->~_Tp(); ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12888	2023-02-16 19:05:48 +02:00
Nadav Har'El	27ea908c69	test/cql-pytest: regression test for old secondary-index bug This patch adds a cql-pytest test for an old secondary-index bug that was described three years ago in issue #5823. cql-pytest makes it easy to run the same test against different versions of Scylla, and it was used to check that the bug existed in Scylla 2.3.0 but was gone by 2.3.5, and also not present in master or in 2021.1. A bit about the bug itself: A secondary index is useful for equality restrictions (a=2) but can't be used for inequality restrictions (a>=2). In Scylla 3.2.0 we used to have a bug that because the restriction a>=2 couldn't be used through the index, it was ignored completely. This is of course a mistake. Refs #5823 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12856	2023-02-16 19:05:48 +02:00
Alejo Sanchez	16d92b7042	test/topology: pytest driver version use print... instead of log Use print instead of logging. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #12846	2023-02-16 19:05:48 +02:00
Kefu Chai	9520acb1a1	logalloc: mark segment_store_backend's virtual before this change, `seastar_memory_segment_store_backend` is class with virtual method, but it does not have a virtual dtor. but we do use a unique_ptr<segment_store_backend> to manage the lifecycle of an intance of its derived class. to enable the compiler to call the right dtor, we should mark the base class's dtor as virtual. this should address following warings from Clang-17: ``` /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:100:2: error: delete called on non-final 'logalloc::seastar_memory_segment_store_backend' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor] delete __ptr; ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:405:4: note: in instantiation of member function 'std::default_delete<logalloc::seastar_memory_segment_store_backend>::operator()' requested here get_deleter()(std::move(__ptr)); ^ /home/kefu/dev/scylladb/utils/logalloc.cc:812:20: note: in instantiation of member function 'std::unique_ptr<logalloc::seastar_memory_segment_store_backend>::~unique_ptr' requested here : _backend(std::make_unique<seastar_memory_segment_store_backend>()) ^ ``` and ``` /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:100:2: error: delete called on 'logalloc::segment_store_backend' that is abstract but has non-virtual destructor [-Werror,-Wdelete-abstract-non-virtual-dtor] delete __ptr; ^ /home/kefu/.local/bin/../lib/gcc/x86_64-pc-linux-gnu/13.0.1/../../../../include/c++/13.0.1/bits/unique_ptr.h:405:4: note: in instantiation of member function 'std::default_delete<logalloc::segment_store_backend>::operator()' requested here get_deleter()(std::move(__ptr)); ^ /home/kefu/dev/scylladb/utils/logalloc.cc:811:5: note: in instantiation of member function 'std::unique_ptr<logalloc::segment_store_backend>::~unique_ptr' requested here contiguous_memory_segment_store() ^ ``` Fixes #12872 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12873	2023-02-16 19:05:48 +02:00
Avi Kivity	abe157a873	Drop intrusive_set_external_comparator Since `5c0f9a8180` ("mutation_partition: Switch cache of rows onto B-tree") it's no longer in use, except in some performance test, so remove it. Although scylla-gdb.py is sometimes used with older releases, it's so outdated we can remove it from there too. Closes #12868	2023-02-16 19:05:48 +02:00
Kefu Chai	6eab8720c4	tools/schema_loader: do not return ref to a local variable we should never return a reference to local variable. so in this change, a reference to a static variable is returned instead. this should address following warning from Clang 17: ``` /home/kefu/dev/scylladb/tools/schema_loader.cc:146:16: error: returning reference to local temporary object [-Werror,-Wreturn-stack-address] return {}; ^~ ``` Fixes #12875 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12876	2023-02-16 12:15:14 +02:00
Pavel Emelyanov	e234726123	system_keyspace; Make get_compaction_history non static and drop qctx Now the call is done via the system_keyspace instance, so it can be unmarked static and can use the local query processor instead of global qctx. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-16 11:28:04 +03:00
Pavel Emelyanov	52f69643b6	api, compaction_manager: Get compaction history via manager Right now the API handler directly calls static method from system keyspace. Patching it to call compaction manager instead will let the latter use on-board plugged system keyspace for that. If the system keyspace is not plugged, it means early boot or late shutdown, not a good time to get compaction history anyway. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-16 11:27:38 +03:00
Pavel Emelyanov	d0e47ace16	system_keyspace: Move compaction_history_entry to namespace scope It's now a sub-class and it makes forward-declaration in another unit impossible Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-16 11:24:23 +03:00
Takuya ASADA	bf27fdeaa2	scylla_coredump_setup: fix coredump timeout settings We currently configure only TimeoutStartSec, but probably it's not enough to prevent coredump timeout, since TimeoutStartSec is maximum waiting time for service startup, and there is another directive to specify maximum service running time (RuntimeMaxSec). To fix the problem, we should specify RunTimeMaxSec and TimeoutSec (it configures both TimeoutStartSec and TimeoutStopSec). Fixes #5430 Closes #12757	2023-02-16 10:23:20 +02:00
Botond Dénes	e9258018d9	Merge 'date: cleanups to silence warnings from clang' from Kefu Chai - date: drop implicitly generated ctor - date: use std::in_range() to check for invalid year Closes #12878 * github.com:scylladb/scylladb: date: use std::in_range() to check for invalid year date: drop implicitly generated ctor	2023-02-16 10:15:36 +02:00
Botond Dénes	ef50170120	Merge 'build: cmake: sync with configure (2/n)' from Kefu Chai * build: cmake: extract idl out * build: cmake: link cql3 against xxHash * build: cmake: correct the check in Findlibdeflate.cmake * build: cmake find_package(libdeflate) earlier * build: cmake: set more properties to alternator library * build: cmake: include generate_cql_grammar * build: cmake: find xxHash package * build: cmake: add build mode support Closes #12866 * github.com:scylladb/scylladb: build: cmake: correct generate_cql_grammar build: cmake: extract idl out build: cmake: link cql3 against xxHash build: cmake: correct the check in Findlibdeflate.cmake build: cmake: find_package(libdeflate) earlier build: cmake: set more properties to alternator library build: cmake: include generate_cql_grammar build: cmake: find xxHash package build: cmake: add build mode support	2023-02-16 07:11:26 +02:00
Pavel Emelyanov	737f4acc10	features: Enable persisted features on all shards Commit `1365e2f13e` (gms: feature_service: re-enable features on node startup) re-enabled features on feature service very early, so that on boot a node sees its "correct" features state before it starts loading system tables and replaying commitlog. However, checking features happens on all shards independently, so re-enabling should also happen on all shards. One faced problem is in extract_scylla_specific_keyspace_info(). This helper is used when loading non-system keyspace to read scylla-specific keyspace options. The helper is called on all shards and on all-but-zero it evaluates the checked SCYLLA_KEYSPACES feature to false leaving the specific data empty. As the result, different shards have different view of keyspaces' configuration. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12881	2023-02-16 00:52:05 +01:00
Kefu Chai	45f0449ccf	sstables: mx/writer: remove defaulted move ctor because its base class of `writer_impl` has a member variable `_validator`, which has its copy ctor deleted. let's just drop the defaulted move ctor, as compiler is not able to generate one for us. ``` /home/kefu/dev/scylladb/sstables/mx/writer.cc:805:5: error: explicitly defaulted move constructor is implicitly deleted [-Werror,-Wdefaulted-function-deleted] writer(writer&& o) = default; ^ /home/kefu/dev/scylladb/sstables/mx/writer.cc:528:16: note: move constructor of 'writer' is implicitly deleted because base class 'sstable_writer::writer_impl' has a deleted move constructor class writer : public sstable_writer::writer_impl { ^ /home/kefu/dev/scylladb/sstables/writer_impl.hh:29:48: note: copy constructor of 'writer_impl' is implicitly deleted because field '_validator' has a deleted copy constructor mutation_fragment_stream_validating_filter _validator; ^ /home/kefu/dev/scylladb/mutation/mutation_fragment_stream_validator.hh:188:5: note: 'mutation_fragment_stream_validating_filter' has been explicitly marked deleted here mutation_fragment_stream_validating_filter(const mutation_fragment_stream_validating_filter&) = delete; ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12877	2023-02-15 23:06:10 +02:00
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Avi Kivity	ac2a69aab4	Merge 'Move population code into table_population_metadata' from Pavel Emelyanov There's the distribtued_loader::populate_column_family() helper that manages sstables on their way towards table on boot. The method naturally belongs the the table_population_metadata -- a helper class that in fact prepares the ground for the method in question. This PR moves the method into metadata class and removes whole lot of extra alias-references and private-fields exporting methods from it. Also it keeps start_subdir and populate_c._f. logic close to each other and relaxes several excessive checks from them. Closes #12847 * github.com:scylladb/scylladb: distributed_loader: Rename table_population_metadata distributed_loader: Dont check for directory presense twice distributed_loader: Move populate calls into metadata.start() distributed_loader: Remove local aliases and exporters distributed_loader: Move populate_column_family() into population meta	2023-02-15 22:55:48 +02:00
Yaron Kaikov	cbc005c6f5	Revert "dist/debian: drop unused Makefile variable" This reverts commit `d2e3a60428`. Since it's causing a regression preventing from Scylla service to start in deb OS Fixes: #12738 Closes #12857	2023-02-15 22:29:24 +02:00
Pavel Emelyanov	0c7efe38e1	distributed_loader: Rename table_population_metadata It used to be just metadata by providing the meta for population, now it does the population by itself, so rename it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-15 20:15:04 +03:00
Pavel Emelyanov	15926b22f4	distributed_loader: Dont check for directory presense twice Both start_subdir() and populate_subdir() check for the directory to exist with explicit file_exists() check. That's excessive, if the directory wasn't there in the former call, the latter can get this by checking the _sstable_directories map. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-15 20:15:04 +03:00
Pavel Emelyanov	eb477a13ad	distributed_loader: Move populate calls into metadata.start() This makes the metadata class export even shorter API, keeps the three sub-directories scanned in one place and allows removing the zero-shard assertion. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-15 20:15:04 +03:00
Nadav Har'El	ba18c318b9	Merge 'cql3: eliminate column_condition, streamline condition representation' from Avi Kivity column_condition is an LWT-specific boolean expression construct, but recent work allowed it to be re-expressed in terms of generic expressions. This series completes the work and eliminates the column_condition classes and source file. Furthermore, a statement's IF clause is represented as a single expression, rather than a vector of per-column conditions. Closes #12597 * github.com:scylladb/scylladb: cql3: modification_statement: unwrap unnecessary boolean_factors() call cql3: modification_statement: use single expression for conditions cql3: modification_statment: fix lwt null equality rules mangling cql3: broadcast tables: tighten checks on conditions cql3: grammar: communicate LWT IF conditions to AST as a simple expression cql3: column_condition: fold into modification_statement cql3: column_condition: inline column_condition_applies_to into its only caller cql3: column_condition: inline column_condition_collect_marker_specification into its only caller cql3: column_condition: eliminate column_condition class cql3: column_condition: move expression massaging to prepare() cql3: grammar: make columnCondition production return an expression cql3: grammar: eliminate duplication in LWT IF clause "IN (...)" vs "IN ?" cql3: grammar: remove duplication around columnCondition scalar/collection variants cql3: grammar: extract column references into a new production cql3: column_condition: eliminate column_condition::raw	2023-02-15 19:02:56 +02:00
Pavel Emelyanov	123a82adb2	distributed_loader: Remove local aliases and exporters After previous patch all local alias references in populate_column_family() are no longer requires. Neither are the exporting calls from the table_population_metadata class. Some non-obvious change is capturing 'this' instead of 'global_table' on calls that are cross-shard. That's OK, table_population_metadata is not sharded<> and is designed for cross-shard usage too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-15 19:57:41 +03:00
Pavel Emelyanov	16fca3fa8a	distributed_loader: Move populate_column_family() into population meta This ownership change also requires the auto& = *this alias and extra specification where to call reshard() and reshape() from. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-15 19:57:41 +03:00
Kefu Chai	76355c056f	build: cmake: correct generate_cql_grammar should have escaped `&` with `\`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-16 00:07:40 +08:00
Kefu Chai	2718963a2a	build: cmake: extract idl out Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-16 00:07:40 +08:00
Kefu Chai	9416af8b80	build: cmake: link cql3 against xxHash turns out cql3 also indirectly uses the header file(s) which in turn includes xxhash header. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-16 00:07:40 +08:00
Kefu Chai	d6746fc49c	build: cmake: correct the check in Findlibdeflate.cmake otherwise libdeflate is never found. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-16 00:07:40 +08:00
Kefu Chai	1ac5932440	build: cmake: find_package(libdeflate) earlier so it can be linked by scylla Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-16 00:07:37 +08:00
Kefu Chai	bd1ea104fe	build: cmake: set more properties to alternator library alternator headers are exposed to the target which links against it, so let's expose them using the `target_include_directories()`. also, `alternator` uses Seastar library and uses xxHash indirectly. we should fix the latter by exposing the included header instead, but for now, let's just link alternator directly to xxHash. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-16 00:07:37 +08:00
Kefu Chai	a0f3c9ebf9	build: cmake: include generate_cql_grammar we should include "generate_cql_grammar.cmake" for using `generate_cql_grammar()` function. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-16 00:07:37 +08:00
Kefu Chai	b6a8341eef	build: cmake: find xxHash package we use private API in xxHash, it'd be handy to expose it in the form of a library target. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-16 00:07:37 +08:00
Kefu Chai	b234c839e4	build: cmake: add build mode support Scylla uses different build mode to customize the build for different purposes. in this change, instead of having it in a python dictionary, the customized settings are located in their own files, and loaded on demand. we don't support multi-config generator yet. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-16 00:07:37 +08:00
Kefu Chai	55b46ab1a3	date: use std::in_range() to check for invalid year for better readability, and to silence following warning from Clang 17: ``` /home/kefu/dev/scylladb/utils/date.h:5965:25: error: result of comparison of constant 9223372036854775807 with expression of type 'int' is always true [-Werror,-Wtautological-constant-out-of-range-compare] Y <= static_cast<int64_t>(year::max()))) ~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/kefu/dev/scylladb/utils/date.h:5964:57: error: result of comparison of constant -9223372036854775808 with expression of type 'int' is always true [-Werror,-Wtautological-constant-out-of-range-compare] if (!(static_cast<int64_t>(year::min()) <= Y && ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:56:49 +08:00
Kefu Chai	90981ebb50	date: drop implicitly generated ctor as one of its member variable does not have default constructor. this silences following warning from Clang-17: ``` /home/kefu/dev/scylladb/utils/date.h:708:5: error: explicitly defaulted default constructor is implicitly deleted [-Werror,-Wdefaulted-function-deleted] year_month_weekday() = default; ^ /home/kefu/dev/scylladb/utils/date.h:705:27: note: default constructor of 'year_month_weekday' is implicitly deleted because field 'wdi_' has no default constructor date::weekday_indexed wdi_; ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:56:49 +08:00
Gleb Natapov	9bdef9158e	raft: abort applier fiber when a state machine aborts After `5badf20c7a` applier fiber does not stop after it gets abort error from a state machine which may trigger an assertion because previous batch is not applied. Fix it. Fixes #12863	2023-02-15 15:54:19 +02:00
Gleb Natapov	dfcd56736b	raft: fix race in add_entry_on_leader that may cause incorrect log length accounting In add_entry_on_leader after wait_for_memory_permit() resolves but before the fiber continue to run the node may stop becoming the leader and then become a leader again which will cause currently hold units outdated. Detect this case by checking the term after the preemption.	2023-02-15 15:51:59 +02:00
Petr Gusev	b37eee26e1	test_shed_too_large_request: clarify the comments	2023-02-15 17:18:17 +04:00
Petr Gusev	4328f52242	test_shed_too_large_request: use smaller test string There was a vague comment about CI using larger limits for shedding. This turned out to be false, and the real reason of different limits is that Scylla handles the -m command line option differently in debug and release builds. Debug builds use the default memory allocator and the value of -m Scylla option is given to each shard. In release builds memory is evenly distributed between shards. To accommodate for this we read the current memory limit from Scylla metrics. The helper class ScyllaMetrics was introduced to handle metrics parsing logic. It can potentially be reused for dealing with metrics in other tests.	2023-02-15 17:18:10 +04:00
Avi Kivity	9454844751	cql3: modification_statement: unwrap unnecessary boolean_factors() call for_each_expression() will recurse anyway.	2023-02-15 14:21:26 +02:00
Avi Kivity	1d0854c0bc	cql3: modification_statement: use single expression for conditions Currently, we use two vectors for static and regular column conditions, each element referring to a single column. There's a comment that keeping them separate makes things simpler, but in fact we always treat both equally (except in one case where we look at just the regular columns and check that no static column conditions exist). Simplify by storing just a single expression, which can be a conjunction of mulitple column conditions. add_condition() is renamed to analyze_condition(), since it now longers adds to the vectors.	2023-02-15 14:21:26 +02:00
Avi Kivity	5cb7655a9f	cql3: modification_statment: fix lwt null equality rules mangling search_and_replace() needs to return std::nullopt when it doesn't match, or it doesn't recurse properly. Currently it doesn't break anything because we only call the function on a binary_operator, but soon it will.	2023-02-15 14:21:26 +02:00
Avi Kivity	c50c9c86b3	cql3: broadcast tables: tighten checks on conditions We don't support checks on static columns in broadbast tables, so explicitly reject them.	2023-02-15 14:21:26 +02:00
Avi Kivity	4d125bffdf	cql3: grammar: communicate LWT IF conditions to AST as a simple expression Instead of passing a vector of boolean factors, pass a single expression (a conjunction). This prepares the way for more complex expressions, but no grammar changes are made here. The expression is stored as optional, since we'll need a way to indicate whether an IF clause was supplied or not. We could play games with boolean_factors(), but it becomes too tricky. The expressions are broken down back to boolean factors during prepare. We'll later consolidate them too.	2023-02-15 14:21:26 +02:00
Avi Kivity	23bd7d24df	cql3: column_condition: fold into modification_statement Move column_condition_prepare() and its helper function into modification_statement, its only caller. The column_condition.{cc,hh} now become empty, so remove them. This eliminates the column_condition concept, which was just a custom expression, in favor of generic expressions. It still has custom properties due to LWT specialness, but those custom properties are isolated in column_condition_prepare().	2023-02-15 14:21:24 +02:00
Avi Kivity	12be5d4208	cql3: column_condition: inline column_condition_applies_to into its only caller This two-liner can be trivilly inlined with no loss of meaning. Indeed it's less confusing, because "applies_to" became less meaningful once we integrated the column_value component into the expression.	2023-02-15 14:19:55 +02:00
Avi Kivity	82fb838a70	cql3: column_condition: inline column_condition_collect_marker_specification into its only caller This one-liner can be trivilly inlined with no loss of meaning.	2023-02-15 14:19:55 +02:00
Avi Kivity	e7b9d9dab9	cql3: column_condition: eliminate column_condition class It's become a wrapper around expression, so peel it off. The methods are converted free functions, with the intent to later inline them into their callers, as they are also mostly just wrappers.	2023-02-15 14:19:55 +02:00
Avi Kivity	4e93cf9ae9	cql3: column_condition: move expression massaging to prepare() Move logic out of the column_condition constructor so it becomes a trivial wrapper, ripe for elimination.	2023-02-15 14:19:55 +02:00
Avi Kivity	31e37ff559	cql3: grammar: make columnCondition production return an expression Instead of appending to a vector, just return an expression. This makes the production self-sufficient and more natural to use.	2023-02-15 14:19:55 +02:00
Avi Kivity	d8d4d0bd72	cql3: grammar: eliminate duplication in LWT IF clause "IN (...)" vs "IN ?" The IN operator recognition is duplicated; de-duplicate it by introducing the (somewhat artificial) singleColumnInValuesOrMarkerExpr production.	2023-02-15 14:19:55 +02:00
Avi Kivity	c47cf9858b	cql3: grammar: remove duplication around columnCondition scalar/collection variants columnCondition duplicates the grammar for scalar relations and subscripted collection relations. Eliminate the duplication by introducing a subscriptExpr production, which encapsulates the differences.	2023-02-15 14:19:55 +02:00
Avi Kivity	74da77f442	cql3: grammar: extract column references into a new production Eliminate repetition by creating a new columnRefExpr and referring to it. Only LWT IF is updated so far. No grammar changes.	2023-02-15 14:19:55 +02:00
Avi Kivity	4d7d3c78a2	cql3: column_condition: eliminate column_condition::raw It's now a thin wrapper around an expression, so peel the wrapper and keep just the expression. A boolean expression is, after all, a condition, and we'll make the condition statement-wide soon rather than apply just to a column.	2023-02-15 14:19:55 +02:00
guy9	4dd14af7d5	Adding ScyllaDB University LIVE Q1 2023 to Docs top banner Closes #12860	2023-02-15 13:15:30 +02:00
Nadav Har'El	2d6c53c047	test/cql-pytest: reproduce bug in static-column index lookup This patch adds a reproducer to a static-column index lookup bug described in issue #12829: The restriction `where pk=0 and s=1 and c=3` where pk,c are the primary key and s is an indexed static column, results in an internal error: "clustering column id 2 >= 2". Unfortunately, because on_internal_error() crashes Scylla in debug mode, we need to mark this failing test with skip instead of xfail. Refs #12829 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12852	2023-02-15 12:23:36 +02:00
Benny Halevy	bb36237cf4	topology: optimize compare_endpoints This function is called on the fast data path from storage_proxy when sorting multiple endpoints by proximity. This change calculates numeric node diff metrics based on each address proximity to a given node (by <dc, rack, same node>) to eliminate logic branches in the function and reduce its footprint. based on objdump -d output, compare_endpoints footprint was reduced by 58.5% (3632 / 8752 bytes) with clang version 15.0.7 (Fedora 15.0.7-1.fc37) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-15 11:48:24 +02:00
Benny Halevy	3ac2df9480	to_string: add print operators for std::{weak,partial}_ordering Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-15 11:09:04 +02:00
Benny Halevy	bd6f88c193	utils: to_sstring: deinline std::strong_ordering print operator Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-15 11:09:04 +02:00
Benny Halevy	25ebc63b82	move to_string.hh to utils/ Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-15 11:09:04 +02:00
Benny Halevy	e7af35a64d	test: network_topology: add test_topology_compare_endpoints Add a regression unit test for topology::compare_endpoint before it's optimized in the following patches. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-15 11:09:02 +02:00
Avi Kivity	69a385fd9d	Introduce schema/ module Schema related files are moved there. This excludes schema files that also interact with mutations, because the mutation module depends on the schema. Those files will have to go into a separate module. Closes #12858	2023-02-15 11:01:50 +02:00
Petr Gusev	1f850374fa	test_shed_too_large_request fix: disable compression The test relies on exact request size, this doesn't work if compression is applied. The driver enables compression only if both the server and the client agree on the codec to use. If compression package (e.g. lz4) is not installed, the compression is not used. The trick with locally_supported_compressions is needed since I couldn't find any standard means to disable compression other than the compression flag on the cluster object, which seemed too broad. Fixes #12836	2023-02-15 11:55:49 +04:00
Nadav Har'El	c0114d8b02	test/cql-pytest: test another case of ALLOW FILTERING In issue #12828 it was noted that Scylla requires ALLOW FILTERING for `where b=1 and c=1` where b is an indexed static column and c is a clustering key, and it was suggested that this is a bug. This patch adds a test that confirms that both Scylla and Cassandra require ALLOW FILTERING in this case. We explain in a comment that this requirement is expected (i.e., it's not a bug), as the `b=1` may match a huge number of rows, and the `c=1` may further match just a few of those - i.e., it is filtering. This test is virtually identical to the test we already had for `where a=1 and c=1` - when `a` is an indexed regular column. There too, the ALLOW FILTERING is required. Closes #12828 as "not a bug". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12848	2023-02-15 08:43:19 +02:00
Raphael S. Carvalho	ba022f7218	replica: compaction_group: Use sstable_set::size() More efficient than retrieving size from sstable_set::all() which may involve copy of elements. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12835	2023-02-15 06:53:04 +02:00
Avi Kivity	19edaa9b78	Merge 'build: cmake: sync with configure.py' from Kefu Chai this is the first step to reenable cmake to build scylla, so we can experiment C++20 modules and other changes before porting them to `configure.py` . please note, this changeset alone does not address all issues yet. as this is a low priority project, i want to do this in smaller (or tiny!) steps. * build: cmake: s/Abseil/absl/ * build: cmake: sync with source files compiled in configure.py * build: cmake: do not generate crc_combine_table at build time * build: cmake: use packaged libdeflate Closes #12838 * github.com:scylladb/scylladb: build: cmake: add rust binding build: cmake: extract cql3 and alternator out build: cmake: use packaged libdeflate build: cmake: do not generate crc_combine_table at build time build: cmake: sync with source files compiled in configure.py build: cmake: s/Abseil/absl/	2023-02-14 22:37:10 +02:00
Avi Kivity	df497a5a94	Merge 'treewide: remove implicitly deleted copy ctor and assignment operator' from Kefu Chai clang 17 trunk helped to identify these issues. so let's fix them. Closes #12842 * github.com:scylladb/scylladb: row_cache: drop defaulted move assignment operator utils/histogram: drop defaulted copy ctor and assignment operator range_tombstone_list: remove defaulted move assignment operator query-result: remove implicitly deleted copy ctor	2023-02-14 20:24:26 +02:00
Kefu Chai	95f8b4eab1	build: cmake: add rust binding Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-14 23:54:20 +08:00
Kefu Chai	f8671188c7	build: cmake: extract cql3 and alternator out Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-14 23:54:20 +08:00
Aleksandra Martyniuk	7b5e653fc9	test: use functions from task_manager_utils.py in test_task_manager.py	2023-02-14 13:34:11 +01:00
Aleksandra Martyniuk	02931163ef	test: add task_manager_utils.py Task manager api will be used in many tests. Thus, to make it easier api calls to task manager are wrapped into functions in task_manager_utils.py. Some helpers that may be reused in other tests are moved there too.	2023-02-14 13:34:04 +01:00
Kefu Chai	9ea8a46dd6	build: cmake: use packaged libdeflate this mirrors the change in `b8b78959fb` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-14 19:25:02 +08:00
Kefu Chai	89542232c9	row_cache: drop defaulted move assignment operator as it has a reference type member variable. and Clang 17 warns at seeing this ``` /home/kefu/dev/scylladb/row_cache.hh:359:16: warning: explicitly defaulted move assignment operator is implicitly deleted [-Wdefaulted-function-deleted] row_cache& operator=(row_cache&&) = default; ^ /home/kefu/dev/scylladb/row_cache.hh:214:20: note: move assignment operator of 'row_cache' is implicitly deleted because field '_tracker' is of reference type 'cache_tracker &' cache_tracker& _tracker; ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-14 19:22:19 +08:00
Kefu Chai	68327123ac	utils/histogram: drop defaulted copy ctor and assignment operator as one of the (indirected) member variables has a user-declared move ctor, this prevents the compiler from generating the default copy ctor or assignment operator for the classes containing `timer`. ``` /home/kefu/dev/scylladb/utils/histogram.hh:440:5: warning: explicitly defaulted copy constructor is implicitly deleted [-Wdefaulted-function-deleted] timed_rate_moving_average_and_histogram(const timed_rate_moving_average_and_histogram&) = default; ^ /home/kefu/dev/scylladb/utils/histogram.hh:437:31: note: copy constructor of 'timed_rate_moving_average_and_histogram' is implicitly deleted because field 'met' has a deleted copy constructor timed_rate_moving_average met; ^ /home/kefu/dev/scylladb/utils/histogram.hh:298:17: note: copy constructor of 'timed_rate_moving_average' is implicitly deleted because field '_timer' has a deleted copy constructor meter_timer _timer; ^ /home/kefu/dev/scylladb/utils/histogram.hh:212:13: note: copy constructor of 'meter_timer' is implicitly deleted because field '_timer' has a deleted copy constructor timer<> _timer; ^ /home/kefu/dev/scylladb/seastar/include/seastar/core/timer.hh:111:5: note: copy constructor is implicitly deleted because 'timer<>' has a user-declared move constructor timer(timer&& t) noexcept : _sg(t._sg), _callback(std::move(t._callback)), _expiry(std::move(t._expiry)), _period(std::move(t._period)), ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-14 19:22:19 +08:00
Kefu Chai	b13caeedda	range_tombstone_list: remove defaulted move assignment operator as `range_tombstone_list::reverter` has a member variable of `const schema& _s`, which cannot be mutated, so it is not allowed to have an assignment operator. this change should address the warning from Clang 17: ``` /home/kefu/dev/scylladb/range_tombstone_list.hh:122:19: warning: explicitly defaulted move assignment operator is implicitly deleted [-Wdefaulted-function-deleted] reverter& operator=(reverter&&) = default; ^ /home/kefu/dev/scylladb/range_tombstone_list.hh:111:23: note: move assignment operator of 'reverter' is implicitly deleted because field '_s' is of reference type 'const schema &' const schema& _s; ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-14 19:22:19 +08:00
Kefu Chai	f36fdff622	query-result: remove implicitly deleted copy ctor as one of the (indirect) member variables of `query::result` is not copyable, compiler refuses to create a copy ctor or an assignment operator for us, an Clang 17 warns at seeing this. so let's just drop them for better readability and more importantly to preserve the correctness. ``` /home/kefu/dev/scylladb/query-result.hh:385:5: warning: explicitly defaulted copy constructor is implicitly deleted [-Wdefaulted-function-deleted] result(const result&) = default; ^ /home/kefu/dev/scylladb/query-result.hh:321:34: note: copy constructor of 'result' is implicitly deleted because field '_memory_tracker' has a deleted copy constructor query::result_memory_tracker _memory_tracker; ^ /home/kefu/dev/scylladb/query-result.hh:97:23: note: copy constructor of 'result_memory_tracker' is implicitly deleted because field '_units' has a deleted copy constructor semaphore_units<> _units; ^ /home/kefu/dev/scylladb/seastar/include/seastar/core/semaphore.hh:500:5: note: 'semaphore_units' has been explicitly marked deleted here semaphore_units(const semaphore_units&) = delete; ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-14 19:22:19 +08:00
Avi Kivity	4f5a460db9	Update seastar submodule * seastar 943c09f869...9b6e181e42 (34): > semaphore: disallow move after used > Revert "semaphore: assert no outstanding units when moved" > reactor, tests: drop unused include > spawn_test: prolong termination time to be more tolerant. > net: s/offload_info()/get_offload_info()/ > Merge 'Extend http client with keep-alive connections' from Pavel Emelyanov > util/gcc6-concepts.hh: drop gcc6-concepts.hh > treewide: do not inline tls variables in shared library > reactor: Remove --num-io-queues option > build: correct the comment > smp: do not inline function when BUILD_SHARED_LIBS > iostream: always flush _fd in do_flush > thread_pool: prevent missed wakeup when the reactor goes to sleep in parallel with a syscall completion > Merge 'build: do not always build seastar as a static library' from Kefu Chai > Revert "Merge 'Keep outgoing queue all cancellable while negotiating' from Pavel Emelyanov" > Merge 'Keep outgoing queue all cancellable while negotiating' from Pavel Emelyanov > memcached: prolong expiration time to be more tolerant > treewide: add non-seastar "#include"s > Merge 'Allow multiple abort requests' from Aleksandra Martyniuk > app-template: remove duplicated includes > include/seastar: s/SEASTAR_NODISCARD/[[nodiscard]]/ > prometheus: Don't report labels that starts with __ > memory: do not define variable only for assert > reactor: set_shard_field_width() after resource::allocate() > Merge 'reactor, core/resource: clean ups' from Kefu Chai > util/concepts: include <concepts> > build: use target_link_options() to pass options to linker > iostream: add doxygen comment for eof() > Merge 'util/print_safe, reactor: use concept for type constraints and refactory ' from Kefu Chai > Right align the memory diagnostics > Merge 'Add an API for the metrics layer to manipulate metrics dynamically.' from Amnon Heiman > semaphore: assert no outstanding units when moved > build: do not populate package registry by default > build: stop detecting concepts support Closes #12827	2023-02-14 13:04:17 +02:00
Avi Kivity	c5e4bf51bd	Introduce mutation/ module Move mutation-related files to a new mutation/ directory. The names are kept in the global namespace to reduce churn; the names are unambiguous in any case. mutation_reader remains in the readers/ module. mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this patch. This is a step forward towards librarization or modularization of the source base. Closes #12788	2023-02-14 11:19:03 +02:00
Kefu Chai	e2a20a108f	tools: toolchain: dbuild: reindent a "case" block to replace tabs with spaces, for better readability if the editor fails to render tabs with the right tabstop setting. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12839	2023-02-14 10:37:25 +02:00
Raphael S. Carvalho	d6fe99abc4	replica: table: Update stats for newly added SSTables Patch `55a8421e3d` fixed an inefficiency when rebuilding statistics with many compaction groups, but it incorrectly removed the update for newly added SSTables. This patch restores it. When a new SSTable is added to any of the groups, the stats are incrementally updated (as before). On compaction completion, statistics are still rebuilt by simply iterating through each group, which keeps track of its own stats. Unit tests are added to guarantee the stats are correct both after compaction completion and memtable flush. Fixes #12808. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12834	2023-02-14 10:28:53 +02:00
Wojciech Mitros	cab5b08948	git: remove Cargo.lock from .gitignore When rust wasmtime bindings were added, we commited Cargo.lock to make sure a given version of Scylla always builds using the same versions of rust dependencies. Therefore, it should not be present in .gitignore. Closes #12831	2023-02-14 08:51:53 +02:00
Wojciech Mitros	8b756cb73f	rust: update dependencies Wasmtime added some improvements in recent releases - particularly, two security issues were patched in version 2.0.2. There were no breaking changes for our use other than the strategy of returning Traps - all of them are now anyhow::Errors instead, but we can still downcast to them, and read the corresponding error message. The cxx, anyhow and futures dependency versions now match the versions saved in the Cargo.lock. Closes #12830	2023-02-14 08:51:20 +02:00
Nadav Har'El	14cdd034ee	test/alternator: fix flaky test for partition-tombstone scan The test test_scan.py::test_scan_long_partition_tombstone_string checks that a full-table Scan operation ends a page in the middle of a very long string of partition tombstones, and does NOT scan the entire table in one page (if we did that, getting a single page could take an unbounded amount of time). The test is currently flaky, having failed in CI runs three times in the past two months. The reason for the flakiness is that we don't know exactly how long we need to make the sequence of partition tombstones in the test before we can be absolutely sure a single page will not read this entire sequence. For single-partition scans we have the "query_tombstone_page_limit" configuration parameter, which tells us exactly how long we need to make the sequence of row tombstones. But for a full-table scan of partition tombstones, the situation is more complicated - because the scan is done in parallel on several vnodes in parallel and each of them needs to read query_tombstone_page_limit before it stops. In my experiments, using query_tombstone_limit * 4 consecutive tombstones was always enough - I ran this test hundreds of times and it didn't fail once. But since it did fail on Jenkins very rarely (3 times in the last two months), maybe the multiplier 4 isn't enough. So this patch doubles it to 8. Hopefully this would be enough for anyone (TM). This makes this test even bigger and slower than it was. To make it faster, I changed this test's write isolation mode from the default always_use_lwt to forbid_rmw (not use LWT). This leaves the test's total run time to be similar to what it was before this patch - around 0.5 seconds in dev build mode on my laptop. Fixes #12817 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12819	2023-02-14 08:09:44 +02:00
Kefu Chai	cec2e2f993	build: cmake: do not generate crc_combine_table at build time mirrors the change in `70217b5109` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-14 11:42:08 +08:00
Kefu Chai	a8fca52398	build: cmake: sync with source files compiled in configure.py these source files are out of sync with the source files listed in `configured.py`. some of them were removed, some of them were added. let's try to keep them in sync. this pave the road to a working CMakeLists.txt Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-14 11:42:04 +08:00
Kefu Chai	50ff27514c	build: cmake: s/Abseil/absl/ find abseil library with the name of absl, instead of "Abseil". absl's cmake config file is provided with the name of `abslConfig.cmake`, not `AbseilConfig.cmake`. see also `cde2f0eaae/CMakeLists.txt (L198)` . Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-14 11:41:59 +08:00
Nadav Har'El	310638e84d	Merge 'wasm: deserialize counters as integers' from Wojciech Mitros Currently, because serialize_visitor::operator() is not implemented for counters, we cannot convert a counter returned by a WASM UDF to bytes when returning from wasm::run_script(). We could disallow using counters as WASM UDF return types, but an easier solution which we're already using in Lua UDFs is treating the returned counters as 64-bit integers when deserializing. This patch implements the latter approach and adds a test for it. Closes #12806 * github.com:scylladb/scylladb: wasm udf: deserialize counters as integers test_wasm.py: add utility function for reading WASM UDF saved in files	2023-02-13 19:24:11 +02:00
Nadav Har'El	6a45881d22	Merge 'functions: handle replacing UDFs used in UDAs' from Wojciech Mitros This patch is based on #12681, only last 3 commits are relevant. As described in #12709, currently, when a UDF used in a UDA is replaced, the UDA is not updated until the whole node is restarted. This patch fixes the issue by updating all affected UDAs when a UDF is replaced. Additionally, it includes a few convenience changes Closes #12710 * github.com:scylladb/scylladb: uda: change the UDF used in a UDA if it's replaced functions: add helper same_signature method uda: return aggregate functions as shared pointers	2023-02-13 16:30:24 +02:00
Benny Halevy	b2d3c1fcc2	abstract_replication_strategy: add for_each_natural_endpoint_until Currently, effective_replication_map::do_get_ranges accepts a functor that traverses the natural endpoints of each token to decide whether a token range should be returned or not. This is done by copying the natural endpoints vector for each token. However, other than special strategies like everywhere and local, the functor can be called on the precalculated inet_address_vector_replica_set in the replication_map and there's no need to copy it for each call. for_each_natural_endpoint_until passes a reference to the function down to the abstract replication strategy to let it work either on the precalculated inet_address_vector_replica_set or on a ad-hoc vector prepared by the replication strategy. The function returns stop_iteration::yes when a match or mismatch are found, or stop_iteration::no while it has no definite result. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12737	2023-02-13 16:30:24 +02:00
Nadav Har'El	efed973dd3	Merge 'cql3: convert LWT IF clause to expressions' from Avi Kivity LWT `IF` (column_condition) duplicates the expression prepare and evaluation code. Annoyingly, LWT IF semantics are a little different than the rest of CQL: a NULL equals NULL, whereas usually NULL = NULL evaluates to NULL. This series converts `IF` prepare and evaluate to use the standard expression code. We employ expression rewriting to adjust for the slightly different semantics. In a few places, we adjust LWT semantics to harmonize them with the rest of CQL. These are pointed out in their own separate patches so the changes don't get lost in the flood. Closes #12356 * github.com:scylladb/scylladb: cql3: lwt: move IF clause expression construction to grammar cql3: column_condition: evaluate column_condition as a single expression cql3: lwt: allow negative list indexes in IF clause cql3: lwt: do not short-circuit col[NULL] in IF clause cql3: column_condition: convert _column to an expression cql3: expr: generalize evaluation of subscript expressions cql3: expr: introduce adjust_for_collection_as_maps() cql3: update_parameters: use evaluation_inputs compatible row prefetch cql3: expr: protect extract_column_value() from partial clustering keys cql3: expr: extract extract_column_value() from evaluation machinery cql3: selection: introduce selection_from_partition_slice cql3: expr: move check for ordering on duration types from restrictions to prepare cql3: expr: remove restrictions oper_is_slice() in favor of expr::is_slice() cql3: column_condition: optimize LIKE with constant pattern after preparing cql3: expr: add optimizer for LIKE with constant pattern test: lib: add helper to evaluate an expression with bind variables but no table cql3: column_condition: make the left-hand-side part of column_condition::raw cql3: lwt: relax constraints on map subscripts and LIKE patterns cql3: expr: fix search_and_replace() for subscripts cql3: expr: fix function evaluation with NULL inputs cql3: expr: add LWT IF clause variants of binary operators cql3: expr: change evaluate_binop_sides to return more NULL information	2023-02-13 16:30:24 +02:00
Nadav Har'El	621c49b621	test/alternator: more tests for listing streams In issue #12601, a dtest involving paging of ListStreams showed incorrect results - the paged results had one duplicate stream and one missing stream. We believe that the cause of this bug was that the unsorted map of tables can change order between pages. In this patch we add a test test_list_streams_paged_with_new_table which can demonstrate this bug - by adding a lot of tables in mid-paging, we cause the unsorted map to be reshufled and the paging to break. This is not the same situation as in #12601 (which did not involve new tables) but we believe it demonstrates the same bug - and check its fix. Indeed this passes with the fix in pull request #12614 and fails without it. This patch also adds a second test, test_stream_arn_unchanging: That test eliminates a guess we had for the cause of #12601. We thought that maybe stream ARN changing on a table if its schema version changes, but the new test confirms that it actually behaves as expected (the stream ARN doesn't change). Refs #12601 Refs #12614 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12616	2023-02-13 16:30:24 +02:00
Nadav Har'El	25610c81fb	test/cql-pytest: another reproducer for index+limit+filtering bug This patch adds yet another reproducer for issue #10649, where a the combination of filtering and LIMIT returns fewer results when a secondary index is added to the table. Whereas the previous tests we had for this issue involved a regular (global) index, the new test uses a local index (a Scylla-only feature). It shows that the same bug exists also for local indexes, as noticed by a user in #12766. Refs #10649 Refs #12766 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12783	2023-02-13 16:30:24 +02:00
Botond Dénes	e29e836aca	docs/operating-scylla: add a document on diagnostic tools ScyllaDB has wide variety of tools and source of information useful for diagnosing problems. These are scattered all over the place and although most of these are documented, there is currently no document listing all the relevant tools and information sources when it comes to diagnosing a problem. This patch adds just that: a document listing the different tools and information sources, with a brief description of how they can help in diagnosing problems, and a link to the releveant dedicated documentation pages. Closes #12503	2023-02-13 16:30:24 +02:00
Botond Dénes	e55f475db1	Merge 'test/pylib: use larger timeout for decommission/removenode' from Kamil Braun Recently we enabled RBNO by default in all topology operations. This made the operations a bit slower (repair-based topology ops are a bit slower than classic streaming - they do more work), and in debug mode with large number of concurrent tests running, they might timeout. The timeout for bootstrap was already increased before, do the same for decommission/removenode. The previously used timeout was 300 seconds (this is the default used by aiohttp library when it makes HTTP requests), now use the TOPOLOGY_TIMEOUT constant from ScyllaServer which is 1000 seconds. Closes #12765 * github.com:scylladb/scylladb: test/pylib: use larger timeout for decommission/removenode test/pylib: scylla_cluster: rename START_TIMEOUT to TOPOLOGY_TIMEOUT	2023-02-13 16:30:24 +02:00
Kefu Chai	08b7e8b807	configure.py: use seastar_dep and seastar_testing_dep now that these variables are set, let's reuse them when appropriate. less repeatings this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12802	2023-02-13 16:30:24 +02:00
Nadav Har'El	ecfcb93ef5	test/cql-pytest: regression test for old bug of misused index Issue #7659, which we solved long ago, was about a query which included a non-EQ restriction and wrongly picked up one of the indexes. It had a short C++ regression test, but here we add a more elaborate Python test for the same bug. The advantages of the Python test are: 1. The Python test can be run against any version of Scylla (e.g., to whether a certain version contains a backport of the fix). 2. The Python test reproduces not only a "benign" query error, but also an assertion-failed crash which happened when the non-EQ restriction was an "IN". 3. The Python test reproduces the same bug not just for a regular index, but also a local index. I checked that, as expected, these tests pass on master, but fail (and crash Scylla) in old branches before the fix for #7659. Refs #7659. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12797	2023-02-13 16:30:24 +02:00
Takuya ASADA	7e690bac62	install-dependencies.sh: update node_exporter to 1.5.0 Update node_exporter to 1.5.0. Closes scylladb/scylla-pkg#3190 Closes #12793 [avi: regenerate frozen toolchain] Closes #12813	2023-02-13 16:30:24 +02:00
Pavel Emelyanov	fa5f5a3299	sstable_test_env: Remove working_sst helper It's only used by the single test and apparently exists since the times seastar was missing the future::discard_result() sugar Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12803	2023-02-13 16:30:24 +02:00
Wojciech Mitros	b25ee62f75	wasm udf: deserialize counters as integers Currently, because serialize_visitor::operator() is not implemented for counters, we cannot convert a counter returned by a WASM UDF to bytes when returning from wasm::run_script(). We could disallow using counters as WASM UDF return types, but an easier solution which we're already using in Lua UDFs is treating the returned counters as 64-bit integers when deserializing. This patch implements the latter approach and adds a test for it.	2023-02-13 14:24:20 +01:00
Wojciech Mitros	3b8bf1ae3a	test_wasm.py: add utility function for reading WASM UDF saved in files Currently, we're repeating the same os.path, open, read, replace each time we read a WASM UDF from a file. To reduce code bloat, this patch adds a utility function "read_function_from_file" that finds the file and reads it given a function name and an optional new name, for cases when we want to use a different name in cql (mostly for unique_names).	2023-02-13 14:24:20 +01:00
Nadav Har'El	a24600a662	Merge 'test/pylib: split and refactor topology tests' from Alecco Move long running topology tests out of `test_topology.py` and into their own files, so they can be run in parallel. While there, merge simple schema tests. Closes #12804 * github.com:scylladb/scylladb: test/topology: rename topology test file test/topology: lint and type for topology tests test/topology: move topology ip tests to own file test/topology: move topology test remove garbaje... test/topology: move topology rejoin test to own file test/topology: merge topology schema tests and... test/topology: isolate topology smp params test test/topology: move topology helpers to common file	2023-02-12 17:53:48 +02:00
Avi Kivity	87c0d09d03	cql3: lwt: move IF clause expression construction to grammar Instead of the grammar passing expression bits to column_condition, have the grammar construct an unprepared expression and pass it as a whole. column_condition::raw then uses prepare_expression() to prepare it. The call to validate_operation_on_durations() is eliminated, since it's already done be prepare_expression(). Some tests adjusted for slightly different wording.	2023-02-12 17:28:36 +02:00
Avi Kivity	37c9c46101	cql3: column_condition: evaluate column_condition as a single expression Instead of laboriously hand-evaluating each expression component, construct one expression for the entire column_condition during prepare time, and evaluate it using the generic machinery. LWT IF evaluates equality against NULL considering two NULLs as equal. We handle that by rewriting such expressions to use null_handling_style::lwt_nulls. Note we use expr::evaluate() rather than is_satisfied_by(), since the latter doesn't like functions on the top-level, which we have due to LIKE with constant pattern optimization. evaluate() is more generic anyway.	2023-02-12 17:28:05 +02:00
Avi Kivity	8e972b52c5	cql3: lwt: allow negative list indexes in IF clause LWT IF clause errors out on negative list index. This deviates from non-LWT subscript evaluation, PostgresQL, and too-large index, all of which evaluate the subscript operation to NULL. Make things more consistent by also evaluating list[-1] to NULL. A test is adjusted.	2023-02-12 17:28:05 +02:00
Avi Kivity	433b778a4d	cql3: lwt: do not short-circuit col[NULL] in IF clause Currently if an LWT IF clause contains a subscript with NULL as the key, then the entire IF clause is evaluated as FALSE. This is incorrect, because col[NULL] = NULL would simplify to NULL = NULL, which is interpreted as TRUE using the LWT comparisons. Even with SQL NULL handling, "col[NULL] IS NULL" should evaluate to true, but since we short-circuit as soon as we encounter the NULL key, we cannot complete the evaluation. Fix by setting cell_value to null instead of returning immediately. Tests that check for this were adjusted. Since the test changed behavior from not applying the statement to applying it, a new statement is added that undoes the previous one, so downstream statements are not affected.	2023-02-12 17:28:05 +02:00
Avi Kivity	b888e3d26a	cql3: column_condition: convert _column to an expression After this change, all components of column_condition are expressions. One LWT-specific hack was removed from the evaluation path: - lists being represented as maps is made transparent by converting during evaluation with adjust_for_collections_as_maps() column_condition::applies_to() previously handled a missing row by materializing a NULL for the column being evaluated; now it materializes a NULL row instead, since evaluation of the column is moved to common code. A few more cases in lwt_test became legal, though I'm not sure exactly why in this patch.	2023-02-12 17:28:01 +02:00
Avi Kivity	568c1a5a36	cql3: expr: generalize evaluation of subscript expressions Currently, evaluation of a subscript expression x[y] requires that x be a column_value, but that's completely artificial. Generalize it to allow any expression. This is needed after we transform a LWT IF condition from "a[x] = y" to "func(a)[x] = y", where func casts a from a map represention of a list back to a list; but it's also generally useful.	2023-02-12 17:25:46 +02:00
Avi Kivity	6de4032baf	cql3: expr: introduce adjust_for_collection_as_maps() LWT and some list operations represent lists using a form like their mutations, so that the mutation list keys can be recovered and used to update the list. But the evaluation machinery knows nothing about that, and will return the map-form even though the type system thinks it is a list. To handle that, add a utility to rewrite the expression so that the value is re-serialized into the expected list form. The rewrite is implemented as a scalar function taking the map form and returning the list form.	2023-02-12 17:25:46 +02:00
Avi Kivity	3a2d8175fb	cql3: update_parameters: use evaluation_inputs compatible row prefetch update_parameters::prefetch_data is used for some list updates (which need a read-before-write to determine the key to update) and for LWT compare-and-swap. Currently they use a custom structure for representing a read row. Switch to the same structure that is used in evaluation_inputs (and in SELECT statement evaluation) to the expression machinery can be reused. The expression representation is irregular (with different fields for the keys and regular/static columns), so we introduce an old_row structure to hold both the clustering key and the regular row values for cas_request. A nice bonus is that we can use get_non_pk_values() to read the data into the format expected by evaluation_inputs, but on the other hand we have to adjust get_prefetched_list() to fix up the type of the returned list (we return it as a map, not a list, so list updates can access the index).	2023-02-12 17:25:41 +02:00
Avi Kivity	47026b7ee0	cql3: expr: protect extract_column_value() from partial clustering keys Partial clustering keys can exist in COMPACT STORAGE tables (though they are exceedingly rare), and when LWT materializes a static row. Harden extract_column_value() so it is ready for them.	2023-02-12 17:17:01 +02:00
Avi Kivity	c8d77c204f	cql3: expr: extract extract_column_value() from evaluation machinery Expression evaluation works with the evaluation_input structure to compute values. As we move LWT column_condition towards expressions, we'll start using evaluation_input, so provide this helper to ease the transition.	2023-02-12 17:17:01 +02:00
Avi Kivity	721c05b7ec	cql3: selection: introduce selection_from_partition_slice Since expressions were introduced for SELECT statements, they work with `selection` object to represent which table columns they can work with. Probably a neutral representation would have been better, but that's what we have now. LWT works with partition_slice, so introduce a selection_from_partition_slice() helper to bridge the two worlds.	2023-02-12 17:17:01 +02:00
Avi Kivity	31ee13c0c9	cql3: expr: move check for ordering on duration types from restrictions to prepare Both LWT IF clause and SELECT WHERE clause check that a duration type isn't used in an ordered comparison, since duration types are unordered (is 1mo more or less than 30d?). As a first step towards centralizing this check, move the check from restrictions into prepare. When LWT starts using prepare, the duplication will be removed. The error message was changed: the word "slice" is an internal term, and a comparison does not necessarily have to be in a restriction (which is also an internal term). Tests were adjusted.	2023-02-12 17:17:01 +02:00
Avi Kivity	c0b1992fc4	cql3: expr: remove restrictions oper_is_slice() in favor of expr::is_slice() The two are functionally identical, so eliminate duplicate code.	2023-02-12 17:17:01 +02:00
Avi Kivity	036fa0891f	cql3: column_condition: optimize LIKE with constant pattern after preparing This just moves things around to put all the code we will kill in one place. Note the code was adjusted: before the move, it operated on an unprepared untyped_constant; after the move it operates on a prepared constant.	2023-02-12 17:17:01 +02:00
Avi Kivity	db2fa44a9a	cql3: expr: add optimizer for LIKE with constant pattern Compiling a pattern is expensive and so we should try to do it at prepare time, if the pattern is a constant. Add an optimizer that looks for such cases and replaces them with a unary function that embeds the compiled pattern. This isn't integrated yet with prepare_expr(), since the filtering code isn't ready for generic expressions. Its first user will be LWT, which contains the optimization already (filtering had it as well, but lost it sometime during the expression rewrite). A unit test is added.	2023-02-12 17:16:58 +02:00
Avi Kivity	1959f9937c	test: lib: add helper to evaluate an expression with bind variables but no table Sometimes we want to defeat the expression optimizer's ability to fold constant expressions. A bind variable is a convenient way to do this, without the complexity of faking a schema and row inputs. Add a helper to evaluate an expression with bind variable parameters, doing all the paperwork for us. A companion make_bind_variable() is added to likewise simplify creating bind variables for tests.	2023-02-12 17:05:22 +02:00
Avi Kivity	899c4a7f29	cql3: column_condition: make the left-hand-side part of column_condition::raw LWT IF conditions are collected with the left-hand-side outside the condition structure, then moved back to the prepared condition structure during preparation. Change that so that the raw description also contains the left-hand-side. This makes it more similar to expressions (which LWT conditions aspire to be). The change is mechanical; a bit of code that used to manage the std::pair is moved to column_condition::raw::prepare instead. The schema is now also passed since it's needed to prepare the left-hand-side.	2023-02-12 17:05:22 +02:00
Avi Kivity	f5257533fd	cql3: lwt: relax constraints on map subscripts and LIKE patterns Previously, we rejected map subscripts that are NULL, as well as LIKE patterns that are NULL. General SQL expression evaluation allows NULL everywhere, and doesn't raise errors - an expression involving NULL generally yields NULL. Change the behavior to follow that. Since the new behavior was previously disallowed, no one should have been relying on it and there is no compatibility problem. Update the tests and note it as a CQL extension.	2023-02-12 17:05:22 +02:00
Avi Kivity	b40dc49e05	cql3: expr: fix search_and_replace() for subscripts We forgot to preserve the subscript's type, so fix that. Also drop a leftover throw. It's dead code, immediately after a return.	2023-02-12 17:05:22 +02:00
Avi Kivity	8dda84bb0c	cql3: expr: fix function evaluation with NULL inputs Function call evaluation rejects NULL inputs, unnecssarily. Functions work well with NULL inputs. Fix by relaxing the check. This currently has no impact because functions are not evaluated via expressions, but via selectors.	2023-02-12 17:05:22 +02:00
Avi Kivity	ecdd49317a	cql3: expr: add LWT IF clause variants of binary operators LWT IF clause interprets equality differently from SQL (and the rest of CQL): it thinks NULL equals NULL. Currently, it implements binary operators all by itself so the fact that oper_t::EQ (and friends) means something else in the rest of the code doesn't bother it. However, we can't unify the code (in column_condition.cc) with the rest of expression evaluation if the meaning changes in different places. To prepare for this, introduce a null_handling_style field to binary_operator that defaults to `sql` but can be changed to `lwt_nulls` to indicate this special semantic. A few unit tests are added. LWT itself still isn't modified.	2023-02-12 17:03:03 +02:00
Alejo Sanchez	8bf2d515de	test/topology: rename topology test file Rename test_topology.py to reflect current tests. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-02-12 12:59:31 +01:00
Alejo Sanchez	11691ba7f5	test/topology: lint and type for topology tests Fix minor lint and type hints. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-02-12 12:59:31 +01:00
Alejo Sanchez	49baf6789c	test/topology: move topology ip tests to own file Move slow topology IP related tests to a separate file. Add docstrings. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-02-12 12:59:19 +01:00
Alejo Sanchez	3fcef63a0f	test/topology: move topology test remove garbaje... group0 members to own file Move slow test for removenode with nodes not present in group0 to a server after a sudden stop to a separate file. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-02-12 12:48:39 +01:00
Nadav Har'El	10ca08e8ac	Merge 'Sequence CDC preimage select with Paxos learn write' from Kamil Braun `paxos_response_handler::learn_decision` was calling `cdc_service::augment_mutation_call` concurrently with `storage_proxy::mutate_internal`. `augment_mutation_call` was selecting rows from the base table in order to create the preimage, while `mutate_internal` was writing rows to the table. It was therefore possible for the preimage to observe the update that it accompanied, which doesn't make any sense, because the preimage is supposed to show the state before the update. Fix this by performing the operations sequentially. We can still perform the CDC mutation write concurrently with the base mutation write. `cdc_with_lwt_test` was sometimes failing in debug mode due to this bug and was marked flaky. Unmark it. Also fix a comment in `cdc_with_lwt_test`. Fixes #12098 Closes #12768 * github.com:scylladb/scylladb: test/cql-pytest: test_cdc: regression test for #12098 test/cql: cdc_with_lwt_test: fix comment service: storage_proxy: sequence CDC preimage select with Paxos learn	2023-02-12 13:28:34 +02:00
Alejo Sanchez	655e1587e3	test/topology: move topology rejoin test to own file Move slow test for rejoining a server after a sudden stop to a separate file. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-02-12 12:02:47 +01:00
Alejo Sanchez	7cc669f5a5	test/topology: merge topology schema tests and... ... move them to their own file. Schema verification tests for restart, add, and hard stop of server can be done with the same cluster. Merge them in the same test case. While there, move them to a separate file to be run independently as this is a slow test. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-02-12 12:02:40 +01:00
Alejo Sanchez	93de79d214	test/topology: isolate topology smp params test Move slow test for different smp parameters to its own file. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-02-12 12:02:32 +01:00
Alejo Sanchez	293550ca5c	test/topology: move topology helpers to common file Move helper functions to a common file ahead of splitting topology tests. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-02-12 12:02:16 +01:00
Nadav Har'El	2653865b34	Merge 'test.py: improve test failure handling' from Kamil Braun Improve logging by printing the cluster at the end of each test. Stop performing operations like attempting queries or dropping keyspaces on dirty clusters. Dirty clusters might be completely dead and these operations would only cause more "errors" to happen after a failed test, making it harder to find the real cause of failure. Mark cluster as dirty when a test that uses it fails - after a failed test, we shouldn't assume that the cluster is in a usable state, so we shouldn't reuse it for another test. Rely on the `is_dirty` flag in `PythonTest`s and `CQLApprovalTest`s, similarly to what `TopologyTest`s do. Closes #12652 * github.com:scylladb/scylladb: test.py: rely on ScyllaCluster.is_dirty flag for recycling clusters test/topology: don't drop random_tables keyspace after a failed test test/pylib: mark cluster as dirty after a failed test test: pylib, topology: don't perform operations after test on a dirty cluster test/pylib: print cluster at the end of test	2023-02-12 12:13:25 +02:00
Kamil Braun	54f85c641d	test/pylib: use larger timeout for decommission/removenode Recently we enabled RBNO by default in all topology operations. This made the operations a bit slower (repair-based topology ops are a bit slower than classic streaming - they do more work), and in debug mode with large number of concurrent tests running, they might timeout. The timeout for bootstrap was already increased before, do the same for decommission/removenode. The previously used timeout was 300 seconds (this is the default used by aiohttp library when it makes HTTP requests), now use the TOPOLOGY_TIMEOUT constant from ScyllaServer which is 1000 seconds.	2023-02-10 15:56:31 +01:00
Kamil Braun	fde6ad5fc0	test/pylib: scylla_cluster: rename START_TIMEOUT to TOPOLOGY_TIMEOUT Use a more generic name since the constant will also be used as timeout for decommission and removenode.	2023-02-10 15:56:31 +01:00
Kamil Braun	ca4db9bb72	Merge 'test/raft: test snapshot threshold' from Alecco Force snapshot with schema changes while server down. Then verify schema when bringing back up the server. Closes #12726 * github.com:scylladb/scylladb: pytest/topology: check snapshot transfer raft conf error injection for snapshot test/pylib: one-shot error injection helper	2023-02-10 15:24:46 +01:00
Kamil Braun	540f6d9b78	test/cql-pytest: test_cdc: regression test for #12098 Perform multiple LWT inserts to different keys ensuring none of them observes a preimage. On my machine this test reproduces the problem more than 50% of the time in debug mode.	2023-02-10 14:35:49 +01:00
Avi Kivity	9696ab7fae	cql3: expr: change evaluate_binop_sides to return more NULL information Currently, evaluate_binop_sides() returns std::nullopt if either side is NULL. Since we wish to to add binary operators that do consider NULL on each side, make evaluate_binop_sides return the original NULLs instead (as managed_bytes_opt). Utimately I think evaluate_binop_sides() should disappear, but before that we have to improve unset value checking.	2023-02-10 09:45:35 +02:00
Botond Dénes	423df263f5	Merge 'Sanitize with_sstable_directory() helper in tests' from Pavel Emelyanov The helping wrapper facilitates the usage of sharded<sstable_directory> for several test cases and the helper and its callers had deserved some cleanup over time. Closes #12791 * github.com:scylladb/scylladb: sstable_directory_test: Reindent and de-multiline sstable_directory_test: Enlighten and rename sstable_from_existing_file sstable_directory_test: Remove constant parallelizm parameter	2023-02-10 07:11:38 +02:00
Tomasz Grabiec	402d5fd7e3	cache: Fix empty partition entries being left in cache in some cases Merging rows from different partition versions should preserve the LRU link of the entry from the newer version. We need this in case we're merging two last dummy entries where the older dummy is already unlinked from the LRU. The newer dummy could be the last entry which is still holding the partition entry linked in the LRU. The mutation_partition_v2 merging didn't take the LRU link from the newer entry, and we could end up with the partition entry not having any entries linked in the LRU. Introduced in `f73e2c992f`. Fixes #12778 Closes #12785	2023-02-09 23:03:23 +02:00
Kamil Braun	e2064f4762	Merge 'repair: finish repair immediately on local keyspaces' from Aleksandra Martyniuk System keyspace is a keyspace with local replication strategy and thus it does not need to be repaired. It is possible to invoke repair of this keyspace through the api, which leads to runtime error since peer_events and scylla_table_schema_history have different sharding logic. For keyspaces with local replication strategy repair_service::do_repair_start returns immediately. Closes #12459 * github.com:scylladb/scylladb: test: rest_api: check if repair of system keyspace returns before corresponding task is created repair: finish repair immediately on local keyspaces	2023-02-09 18:44:37 +01:00
Pavel Emelyanov	52e2ad051e	sstable_utils: Move the test_setup to perf/ The sstable perf test uses test_setup ability to create temporary directory and clean it and that's the only place that uses it. Move the remainders of test_setup into perf/ so that no unit tests attempt to re-use it (there's test_env for that). Remove unused _walker and _create_directory while at it. Mark protected stuff private while at it as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-09 17:18:04 +03:00
Pavel Emelyanov	868391a613	sstable_utils: Remove unused wrappers over test_env Now all callers are using the test_env directly Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-09 17:17:48 +03:00
Pavel Emelyanov	47022bf750	sstable_test: Open-code do_with_cloned_tmp_directory The statistics_rewrite case uses the helper that creates a copy of the provided static directory, but it's the only user of this helper. It's better to open-code it into the test case. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-09 17:17:48 +03:00
Pavel Emelyanov	19c1afb20a	sstable_test: Asynchronize statistics_rewrite case It is ran inside async context and can be coded in a shorter form without using deeply nested then-s Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-09 17:17:23 +03:00
Pavel Emelyanov	85b8bae035	tests: Replace test_setup::do_with_tmp_directory with test_env::do_with(_async)? The former helper is just a wrapper over the _async version of the latter and also creates a tempdir and calls the fn with tempdir as an argument. The test_env already has its own temp dir on board, so callers can can be switched to using it. Some test cases use the do_with_tmp_directory but generate chain of futures without in fact using the async context. This patch addresses that, so the change is not 100% mechanical unfortunately. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-09 17:11:31 +03:00
Anna Stuchlik	9f2724231c	doc: add the new KB to the list of topics	2023-02-09 14:42:09 +01:00
Anna Stuchlik	cfdb8a8760	doc: add a new KB article about timbstone garbage collection in ICS	2023-02-09 14:36:06 +01:00
Pavel Emelyanov	f0212c7b68	sstable_directory_test: Reindent and de-multiline Many tests using sstable directory wrapper have broken indentation with previous patching. Fix it. No functional changes. Also, while at it, convert multiline wrapper calls into one-line, after previous patch these are short enough for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-09 16:00:53 +03:00
Pavel Emelyanov	ec02b0f706	sstable_directory_test: Enlighten and rename sstable_from_existing_file It used to be the sstable maker for sstable::test_env / cql_test_env, now sstables for tests are made via sstables manager explicitly, so the guy can be remaned to something more relevant to its current status. Also, de-mark its constructors as explicit to make callers look shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-09 15:59:23 +03:00
Pavel Emelyanov	c843f7937b	sstable_directory_test: Remove constant parallelizm parameter It's 1 (one) all the time, just hard-code it internally Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-09 15:59:01 +03:00
Avi Kivity	fd4ee4878a	Revert "storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops" This reverts commit `e7d5e508bc`. It ends up failing continuous integration tests randomly. We don't know if it's uncovering an existing bug, or if RBNO itself is broken, but for now we need to revert it to unblock progress.	2023-02-09 10:30:26 +02:00
Botond Dénes	b62d84fdba	Merge 'Keep reshape and reshard logic in distributed loader' from Pavel Emelyanov Now it's scattered between dist. loader and sstable directory code making the latter quite bloated. Keeping everything in distributed loader makes the sstable_directory code compact and easier to patch to support object storage backend. Closes #12771 * github.com:scylladb/scylladb: sstable_directory: Rename remove_input_sstables_from_reshaping() sstable_directory: Make use of remove_sstables() helper sstable_directory: Merge output sstables collecting methods distributed_loader: Remove max_compaction_threshold argument from reshard() distributed_loader: Remove compaction_manager& argument from reshard() sstable_directory: Move the .reshard() to distributed_loader sstable_directory: Add helper to load foreign sstable sstable_directory: Add io-prio argument to .reshard() sstable_directory: Move reshard() to distributed_loader.cc distributed_loader: Remove compaction_manager& argument from reshape() sstable_directory: Move the .reshape() to distributed loader sstable_directory: Add helper to retrive local sstables sstable_directory: Add io-prio argument to .reshape() sstable_directory: Move reshape() to distributed_loader.cc	2023-02-09 10:01:44 +02:00
Botond Dénes	1c333e2102	Merge 'Transport server error handling fixes' from Gusev Petr CQL transport sever error handling fixes and improvements: * log failed requests with `DEBUG` level for easier debugging; * in case of unhandled errors, deliver them to the client as `SERVER_ERROR`'s * fix for `protocol_error`'s in case of shedded big requests; * explicit tests have been written for the error handling problems above. Closes #11949 * github.com:scylladb/scylladb: transport server: fix "request size too large" handling transport server: log failed requests with debug level transport server: fix unexpected server errors handling transport server: log client errors with debug level	2023-02-09 09:02:22 +02:00
Anna Stuchlik	c7778dd30b	doc: related https://github.com/scylladb/scylladb/issues/12754 , add the requirement to upgrade Monitoring to version 4.3 Closes #12784	2023-02-09 07:10:34 +02:00
Botond Dénes	746b009db0	Merge 'dist/debian: bump up debhelper compatibility level to 10 and cleanups' from Kefu Chai - dist/debian: bump up debhelper compatibility level to 10 - dist/debian: drop unused Makefile variable Closes #12723 * github.com:scylladb/scylladb: dist/debian: drop unused Makefile variable dist/debian: bump up debhelper compatibility level to 10	2023-02-09 07:04:20 +02:00
Pavel Emelyanov	40de737b36	sstable_directory: Rename remove_input_sstables_from_reshaping() It unlinks unshared sstables filtering some of them out. Name it according to what it does without mentioning reshape/reshard. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-08 15:00:44 +03:00
Pavel Emelyanov	a1dc251214	sstable_directory: Make use of remove_sstables() helper Currently it's called remove_input_sstables_from_resharding() but it's just unlinks sstables in parallel from the given list. So rename it not to mention reshard and also make use of this "new" helper in the remove_input_sstables_from_reshaping(), it needs exactly the same functionality. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-08 15:00:44 +03:00
Pavel Emelyanov	cb36f5e581	sstable_directory: Merge output sstables collecting methods There are two of them collecting sstables from resharding and reshaping. Both doing the same job except for the latter doesn't expect the list to contain remote sstables. This patch merges them together with the help of an extra sanity boolean to check for the remote sstable not in the list. And renames the method not to mention reshape/reshard. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-08 15:00:41 +03:00
Avi Kivity	0f15ff740d	cql3: expr: simplify user/debug formatting We have a cql3::expr::expression::printer wrapper that annotates an expression with a debug_mode boolean prior to formatting. The fmt library, however, provides a much simpler alterantive: a custom format specifier. With this, we can write format("{:user}", expr) for user-oriented prints, or format("{:debug}", expr) for debug-oriented prints (if nothing is specified, the default remains debug). This is done by implementing fmt::formatter::parse() for the expression type, can using expression::printer internally. Since sometimes we pass expression element types rather than the expression variant, we also provide a custom formatter for all ExpressionElement Types. Uses for expression::printer are updated to use the nicer syntax. In one place we eliminate a temporary that is no longer needed since ExpressionElement:s can be formatted directly. Closes #12702	2023-02-08 12:24:58 +02:00
Petr Gusev	3263523b54	transport server: fix "request size too large" handling Calling _read_buf.close() doesn't imply eof(), some data may have already been read into kernel or client buffers and will be returned next time read() is called. When the _server._max_request_size limit was exceeded and the _read_buf was closed, the process_request method finished and we started processing the next request in connection::process. The unread data from _read_buf was treated as the header of the next request frame, resulting in "Invalid or unsupported protocol version" error. The existing test_shed_too_large_request was adjusted. It was originally written with the assumption that the data of a large query would simply be dropped from the socket and the connection could be used to handle the next requests. This behaviour was changed in scylladb#8800, now the connection is closed on the Scylla side and can no longer be used. To check there are no errors in this case, we use Scylla metrics, getting them from the Scylla Prometheus API.	2023-02-08 00:07:08 +04:00
Petr Gusev	0904f98ebf	transport server: log failed requests with debug level These logs can be helpful for debugging, e.g. if an error was not handled correctly by the client driver, or another error occurred while handling it.	2023-02-08 00:07:08 +04:00
Petr Gusev	a4cf509c3d	transport server: fix unexpected server errors handling If request processing ended with an error, it is worth sending the error to the client through make_error/write_response. Previously in this case we just wrote a message to the log and didn't handle the client connection in any way. As a result, the only thing the client got in this case was timeout error. A new test_batch_with_error is added. It is quite difficult to reproduce error condition in a test, so we use error injection instead. Passing injection_key in the body of the request ensures that the exception will be thrown only for this test request and will not affect other requests that the driver may send in the background. Closes: scylladb#12104	2023-02-08 00:07:02 +04:00
Pavel Emelyanov	73d458cf89	distributed_loader: Remove max_compaction_threshold argument from reshard() Since the whole reshard() is local to dist. loader code now, the caller of the reshard helper may let this method get the threshold itself Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-07 19:31:43 +03:00
Pavel Emelyanov	25aaa45256	distributed_loader: Remove compaction_manager& argument from reshard() It can be obtained from the table& Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-07 19:31:43 +03:00
Pavel Emelyanov	15547f1b5b	sstable_directory: Move the .reshard() to distributed_loader Now all the reshading logic is accumulated in distributed loader and the sstable_directory is just the place where sstables are collected. The changes summary is: - add sstable_directory as argument (used to be "this") - replace all "this" captures with &dir ones - remove temporary namespace gap and declaration from sst-dir class Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-07 19:31:43 +03:00
Pavel Emelyanov	ab5f48d496	sstable_directory: Add helper to load foreign sstable This is to generalize the code duplication between .reshard() and existing .load_foreign_sstables() (plural form). Make it coroutinized right at once. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-07 19:31:43 +03:00
Pavel Emelyanov	e6e65c87d5	sstable_directory: Add io-prio argument to .reshard() Now it gets one from this-> but the method is becoming static one in distributed_loader which only has it as an argument. That's not big deal as the current IO class is going to be derived from current sched group, so this extra arg will go away at all some day. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-07 19:31:41 +03:00
Pavel Emelyanov	a32d2b6d6a	sstable_directory: Move reshard() to distributed_loader.cc Just move the code and create temporary namespace gap for that Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-07 19:31:12 +03:00
Pavel Emelyanov	1de8c85acd	distributed_loader: Remove compaction_manager& argument from reshape() It can be obtained from the table& Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-07 19:31:12 +03:00
Pavel Emelyanov	d734b6b7c1	sstable_directory: Move the .reshape() to distributed loader Now all the reshaping logic is accumulated in distributed loader and the sstable_directory is just the place where sstables are collected. The changes summary is: - add sstable_directory as argument (used to be "this") - replace all "this" captures with &dir ones - remove temporary namespace gap and declaration from sst-dir class Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-07 19:30:55 +03:00
Pavel Emelyanov	b906d34807	sstable_directory: Add helper to retrive local sstables There are methods to retrive shared local sstables and foreign sstables, so here's one more to the family Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-07 19:23:40 +03:00
Pavel Emelyanov	420fc8d4df	sstable_directory: Add io-prio argument to .reshape() Now it gets one from this-> but the method is becoming static one in distributed_loader which only has it as an argument. That's not big deal as the current IO class is going to be derived from current sched group, so this extra arg will go away at all some day. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-07 19:22:27 +03:00
Pavel Emelyanov	a70d6017f8	sstable_directory: Move reshape() to distributed_loader.cc Just move the code and create temporary namespace gap for that Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-07 19:21:54 +03:00
Kamil Braun	97b2971bf1	test/cql: cdc_with_lwt_test: fix comment The comment mentioned an entry that shouldn't be there (and it wasn't in the actual expected result).	2023-02-07 16:12:18 +01:00
Kamil Braun	1ef113691a	service: storage_proxy: sequence CDC preimage select with Paxos learn `paxos_response_handler::learn_decision` was calling `cdc_service::augment_mutation_call` concurrently with `storage_proxy::mutate_internal`. `augment_mutation_call` was selecting rows from the base table in order to create the preimage, while `mutate_internal` was writing rows to the table. It was therefore possible for the preimage to observe the update that it accompanied, which doesn't make any sense, because the preimage is supposed to show the state before the update. Fix this by performing the operations sequentially. We can still perform the CDC mutation write concurrently with the base mutation write. `cdc_with_lwt_test` was sometimes failing in debug mode due to this bug and was marked flaky. Unmark it. Fixes #12098	2023-02-07 16:12:18 +01:00
Alejo Sanchez	cf3b8d7edc	pytest/topology: check snapshot transfer Test snapshot transfer by reducing the snapshot threshold on initial servers (3 and 1 trailing). Then creates a table, and does 3 extra schema changes (add column), triggering at least 2 snapshots. Then brings a new server to the cluster, which will get the schema through a snapshot. Then the test stops the initial servers and verifies the table schema is up to date on the new server. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-02-07 16:09:07 +01:00
Petr Gusev	95bf8eebe0	query_ranges_to_vnodes_generator: fix for exclusive boundaries Let the initial range passed to query_partition_key_range be [1, 2) where 2 is the successor of 1 in terms of ring_position order and 1 is equal to vnode. Then query_ranges_to_vnodes_generator() -> [[1, 1], (1, 2)], so we get an empty range (1,2) and subsequently will make a data request with this empty range in storage_proxy::query_partition_key_range_concurrent, which will be redundant. The patch adds a check for this condition after making a split in the main loop in process_one_range. The patch does not attempt to handle cases where the original ranges were empty, since this check is the responsibility of the caller. We only take care not to add empty ranges to the result as an unintentional artifact of the algorithm in query_ranges_to_vnodes_generator. A test case is added in test_get_restricted_ranges. The helper lambda check is changed so that not to limit the number of ranges to the length of expected ranges, otherwise this check passes without the change in process_one_range. Fixes: #12566 Closes #12755	2023-02-07 16:02:31 +02:00
Kefu Chai	afd1221b53	commitlog: mark request_controller_timeout_exception_factory::timeout() noexcept request_controller_timeout_exception_factory::timeout() creates an instance of `request_controller_timed_out_error` whose ctor is default-created by compiler from that of timed_out_error, which is in turn default-created from the one of `std::exception`. and `std::exception::exception` does not throw. so it's safe to mark this factory method `noexcept`. with this specifier, we don't need to worry about the exception thrown by it, and don't need to handle them if any in `seastar::semaphore`, where `timeout()` is called for the customized exception. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12759	2023-02-07 14:38:54 +02:00
Botond Dénes	051da4e148	Merge 'Handle EDQUOT error just like ENOSPC' from Kefu Chai - main: consider EDQUOT as environmental failure also - main: use defer_verbose_shutdown() to shutdown compaction manager - replica/table: extract should_retry() int with_retry - replica/table: retry on EDQUOT when flushing memtable Fixes #12626 Closes #12653 * github.com:scylladb/scylladb: replica/table: retry on EDQUOT when flushing memtable replica/table: extract should_retry() int with_retry main: use defer_verbose_shutdown() to shutdown compaction manager main: consider EDQUOT as environmental failure also	2023-02-07 14:38:36 +02:00
David Garcia	734f09aba7	docs: add flags support in mulitversion Closes #12740	2023-02-07 14:23:53 +02:00
Wojciech Mitros	02bfac0c66	uda: change the UDF used in a UDA if it's replaced Currently, if a UDA uses a UDF that's being replaced, the UDA will still keep using the old UDF until the node is restarted. This patch fixes this behavior by checking all UDAs when replacing a UDF and updating them if necessary. Fixes #12709	2023-02-07 12:17:52 +01:00
Nadav Har'El	3ba011c2be	cql: fix empty aggregation, and add more tests This patch fixes #12475, where an aggregation (e.g., COUNT(*), MIN(v)) of absolutely no partitions (e.g., "WHERE p = null" or "WHERE p in ()") resulted in an internal error instead of the "zero" result that each aggregator expects (e.g., 0 for COUNT, null for MIN). The problem is that normally our aggregator forwarder picks the nodes which hold the relevant partition(s), forwards the request to each of them, and then combines these results. When there are no partitions, the query is sent to no node, and we end up with an empty result set instead of the "zero" results. So in this patch we recognize this case and build those "zero" results (as mentioned above, these aren't always 0 and depend on the aggregation function!). The patch also adds two tests reproducing this issue in a fairly general way (e.g., several aggregators, different aggregation functions) and confirming the patch fixes the bug. The test also includes two additional tests for COUNT aggregation, which uncovered an incompatibility with Cassandra which is still not fixed - so these tests are marked "xfail": Refs #12477: Combining COUNT with GROUP by results with empty results in Cassandra, and one result with empty count in Scylla. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12715	2023-02-07 12:28:42 +02:00
Botond Dénes	bf7113f6dc	Merge 'locator: token_metadata: improve get_address_ranges()' from Michał Chojnowski This two-patch series aims to improve `get_address_ranges()` by eliminating cases of quadratic behavior which were noticed to cause huge allocations, and by deduplicating the code of `get_address_ranges()` with the almost-identical `get_ranges()`. Refs https://github.com/scylladb/scylladb/issues/10337 Refs https://github.com/scylladb/scylladb/issues/10817 Refs https://github.com/scylladb/scylladb/issues/10836 Refs https://github.com/scylladb/scylladb/issues/10837 Fixes https://github.com/scylladb/scylladb/issues/12724 Closes #12733 * github.com:scylladb/scylladb: locator: token_metadata: unify get_address_ranges() and get_ranges() locator: token_metadata: get rid of a quadratic behaviour in get_address_ranges()	2023-02-07 12:28:41 +02:00
Botond Dénes	a01662b287	Merge 'doc: improve the general upgrade policy' from Anna Stuchlik Related: https://github.com/scylladb/scylladb/pull/12586 This PR improves the upgrade policy added with https://github.com/scylladb/scylladb/pull/12586, according to the feedback from: @tzach > Upgrading from 4.6 to 5.0 is not clear; better to use 4.x to 4.y versions as an example. and @bhalevy > It is not completely clear that when upgrading through several versions, the whole cluster needs to be upgraded to each consecutive version, not just the rolling node. In addition, the content is organized into sections for the sake of readability. Closes #12647 * github.com:scylladb/scylladb: doc: add the information abou patch releases doc: add the info about the minor versions doc: reorganize the content on the Upgrade ScyllaDB page doc: improve the overview of the upgrade procedure (apply feedback)	2023-02-07 12:28:41 +02:00
Nadav Har'El	c00fcc80e5	test/cql-pytest: three tests for empty clustering keys This patch adds three additional tests for empty (e.g., empty string) clustering keys. The first test disproves a worry that was raised in #12561 that perhaps empty clustering keys only seem work, but they don't get written to sstables. The new test verifies that there is no bug - they are written and can be read correctly. The second and third test reproduce issue #12749 - an empty clustering should be allowed in a COMPACT STORAGE table only if there is a compound (multi-column) clustering key. But as the tests demonstrate, 1. if there is just one clustering column, Scylla gives the wrong error message, and 2. if there is a compound clustering key, Scylla doesn't allow an empty key as it should. As usual, all tests pass on Cassandra. The last two tests fail on Scylla, so are marked xfail. Refs #12561 Refs #12749 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12750	2023-02-07 12:28:41 +02:00
Petr Gusev	bd80a449d5	transport server: log client errors with debug level Ideally, these errors should be transparently delivered to the client, but in practice, due to various flaws/bugs in scylla and/or the driver, they can be lost, which enormously complicates troubleshooting. const socket_address& get_remote_address() is needed for its convenient conversion to string, which includes ip and port.	2023-02-07 13:53:38 +04:00
Wojciech Mitros	58987215dc	functions: add helper same_signature method When deciding whether two functions have the same signature, we have to check if they have the same name and parameter types. Additionally, if they're represented by pointers, we need to check if any of them is a nullptr. This logic is used multiple times, so it's extracted to a separate function. To use this function, the `used_by_user_aggregate` method takes now a function instead of name and types list - we can do it because we always use it with an existing user function (that we're trying to drop). The method will also be useful when we'll be not dropping, but replacing a user function.	2023-02-07 10:15:12 +01:00
Wojciech Mitros	20069372e7	uda: return aggregate functions as shared pointers We will want to reuse the functions that we get from an aggregate without making a deep copy, and it's only possible if we get pointers from the aggregate instead of actual values.	2023-02-07 10:15:09 +01:00
Kefu Chai	bba03c1a55	replica/table: retry on EDQUOT when flushing memtable retry when memtable flush fails due to EDQUOT. there are chances that user exceeds the disk quota when scylla flushes memtable and user manages to free up the necessary resource before the next retry. before this change, we simply `abort()` in this case. after this change, we just keep on retrying until the service is shutdown. Fixes #12626 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-07 16:00:40 +08:00
Kefu Chai	6d017e75e0	replica/table: extract should_retry() int with_retry * extract a lambda encapsulating the condition if we should retry at seeing an exception when calling functions with `with_retry()`. we apply the same check to the exception raised when performing table related i/o operations. in this change, the two checks are consolidated and extracted into a single lambda, so we can add yet more error code (s) which should be considered retry-able failures. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-07 16:00:40 +08:00
Kefu Chai	d4315245a1	main: use defer_verbose_shutdown() to shutdown compaction manager * use `defer_verbose_shutdown()` to shutdown compaction manager `EDQUOT` is quite similar as `ENOSPC`, in the sense that both of them are caused by environmental issues. before this change, `compaction_manager` filters the ENOSPC exceptions thrown by `compaction_manager::really_do_stop()`, so they are not propagated to caller when calling `compaction_manager::stop()` -- only a warning message is printed in the log. but `EDQUOT` is not handled. after this change, the exception raised by compaction manager's stop process is not filtered anymore and is handled by `defer_verbose_shutdown()` instead, which is able to check the type of exception, and print out error message in the log. so the `ENOSPC` and `EDQUOT` errors are taken care of, and more visible from user's perspective as they are printed as errors instead of warning. but they are not printed using the `compaction_manager` logger anymore. so if our testing or user's workflow depends on this behavior, the related setting should be updated accordingly. Fixes #12626 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-07 16:00:40 +08:00
Kefu Chai	c3ef353e3d	main: consider EDQUOT as environmental failure also EDQUOT can be returned as the errno when the underlying filesystem is trying to reserve necessary resources from disk for performing i/o on behalf of the effective user, and the filesystem fails to acquire the necessary resources. it could be inode, volume space, or whatever resources for completing the i/o operation. but none of them is the consequence of scylla's fault. so we should not `abort()` at seeing this errno. instead, it's should be reported to the administrator. in this change, EDQUOT is also considered as an environmental failure just like EIO, EACCES and ENOSPC. they could be thrown when stopping an server. Fixes #12626 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-07 16:00:40 +08:00
Tomasz Grabiec	ccc8e47db1	Merge 'test/lib: introduce key_utils.hh' from Botond Dénes We currently have two method families to generate partition keys: * make_keys() in test/lib/simple_schema.hh * token_generation_for_shard() in test/lib/sstable_utils.hh Both work only for schemas with a single partition key column of `text` type and both generate keys of fixed size. This is very restrictive and simplistic. Tests, which wanted anything more complicated than that had to rely on open-coded key generation. Also, many tests started to rely on the simplistic nature of these keys, in particular two tests started failing because the new key generation method generated keys of varying size: * sstable_compaction_test.sstable_run_based_compaction_test * sstable_mutation_test.test_key_count_estimation These two tests seems to depend on generated keys all being of the same size. This makes some sense in the case of the key count estimation test, but makes no sense at all to me in the case of the sstable run test. Closes #12657 * github.com:scylladb/scylladb: test/lib/sstable_utils: remove now unused token_generation_for_shard() and friends test/lib/simple_schema: remove now unused make_keys() and friends test: migrate to tests::generate_partition_key[s]() test/lib/test_services: add table_for_tests::make_default_schema() test/lib: add key_utils.hh test/lib/random_schema.hh: value_generator: add min_size_in_bytes	2023-02-06 18:11:32 +01:00
Nadav Har'El	cc207a9f44	Merge 'uda: improve checking whether UDFs are used in UDAs in DROP statements' from Wojciech Mitros This patch fixes 2 issues with checking whether UDFs are used in UDAs: 1) UDFs types are not considered during the check, which prevents us from dropping a UDF that isn't used in any UDAs, but shares its name with one of them. 2) the REDUCEFUNC is not considered during the check, which allows dropping a UDF even though it's used in a UDA as the REDUCEFUNC. Additionally, tests for these issues are added Closes #12681 * github.com:scylladb/scylladb: udf: also check reducefunc to confirm that a UDF is not used in a UDA udf: fix dropping UDFs that share names with other UDFs used in UDAs pytest: add optional argument for new_function argument types	2023-02-06 19:07:26 +02:00
Kamil Braun	56c4d246ef	Merge 'Introduce recent_entries_map datatype to track least recent visited entries.' from Andrii Patsula Fixes: https://github.com/scylladb/scylladb/issues/12309 Closes #12720 * github.com:scylladb/scylladb: service/raft: raft_group_registry: use recent_entries_map to store rate_limits in pinger. Fixes #12309 utils: introduce recent_entries_map datatype to track least recent visited entries.	2023-02-06 18:01:26 +01:00
Botond Dénes	a3b280ba8c	Merge 'doc: document the workaround to install a non-latest ScyllaDB version' from Anna Stuchlik This PR is related to https://github.com/scylladb/scylla-enterprise/issues/2176. It adds a FAQ about a workaround to install a ScyllaDB version that is not the most recent patch version. In addition, the link to that FAQ is added to the patch upgrade guides 2021 and 2022 . Closes #12660 * github.com:scylladb/scylladb: doc: add the missing sudo command doc: replace the reduntant link with an alternative way to install a non-latest version doc: add the link to the FAQ about pinning to the patch upgrade guides 2022 and 2022 doc: add a FAQ with a workaround to install a non-latest ScyllaDB version on Debian and Ubuntu	2023-02-06 17:00:39 +02:00
Kefu Chai	d0a2440023	docs: bump sphinx-sitemap to 2.5.0 `poetry install` consistently times out when resolving the dependencies. like: ``` Command ['/home/kefu/.cache/pypoetry/virtualenvs/scylla-1fWQLpOv-py3.9/bin/python', '-m', 'pip', 'install', '--use-pep517', '--disable-pip-version-check', '--isolated', '--no-input', '--prefix', '/home/kefu/.cache/pypoetry/virtualenvs /scylla-1fWQLpOv-py3.9', '--upgrade', '--no-deps', '/home/kefu/.cache/pypoetry/artifacts/e6/ad/ab/eca9f61c5b15fd05df7192c0e5914a9e5ac927744b1fb5f6c07a92d7a4/sphinx-sitemap-2.2.0.tar.gz'] errored with the following return code 1, and out put: Processing /home/kefu/.cache/pypoetry/artifacts/e6/ad/ab/eca9f61c5b15fd05df7192c0e5914a9e5ac927744b1fb5f6c07a92d7a4/sphinx-sitemap-2.2.0.tar.gz Installing build dependencies: started Installing build dependencies: finished with status 'error' ERROR: Command errored out with exit status 2: command: /home/kefu/.cache/pypoetry/virtualenvs/scylla-1fWQLpOv-py3.9/bin/python /tmp/pip-standalone-pip-z97s216j/__env_pip__.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-37k3lwqd/overlay --no-warn-scrip t-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.8.0' wheel cwd: None Complete output (80 lines): Collecting setuptools>=40.8.0 Downloading setuptools-67.1.0-py3-none-any.whl (1.1 MB) ERROR: Exception: Traceback (most recent call last): File "/tmp/pip-standalone-pip-z97s216j/__env_pip__.zip/pip/_vendor/urllib3/response.py", line 438, in _error_catcher yield File "/tmp/pip-standalone-pip-z97s216j/__env_pip__.zip/pip/_vendor/urllib3/response.py", line 519, in read data = self._fp.read(amt) if not fp_closed else b"" File "/tmp/pip-standalone-pip-z97s216j/__env_pip__.zip/pip/_vendor/cachecontrol/filewrapper.py", line 62, in read data = self.__fp.read(amt) File "/usr/lib64/python3.9/http/client.py", line 463, in read n = self.readinto(b) File "/usr/lib64/python3.9/http/client.py", line 507, in readinto n = self.fp.readinto(b) File "/usr/lib64/python3.9/socket.py", line 704, in readinto return self._sock.recv_into(b) File "/usr/lib64/python3.9/ssl.py", line 1242, in recv_into return self.read(nbytes, buffer) File "/usr/lib64/python3.9/ssl.py", line 1100, in read return self._sslobj.read(len, buffer) socket.timeout: The read operation timed out ``` while sphinx-sitemap 2.5.0 installs without problems. sphinx-sitemap 2.50 is the latest version published to pypi. according to sphinx-sitemap's changelog at https://github.com/jdillard/sphinx-sitemap/blob/master/CHANGELOG.rst , no breaking changes were introduced in between 2.2.0 and 2.5.0. after bumping sphinx-sitemap 2.5.0, following commands can complete without errors: ``` poetry lock make preview ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12705	2023-02-06 15:50:48 +02:00
Anna Stuchlik	c772563cb8	doc: add the information abou patch releases	2023-02-06 14:47:39 +01:00
Botond Dénes	cb2a129371	Merge 'Fix inefficiency when rebuilding table statistics with compaction groups' from Raphael "Raph" Carvalho [table: Fix disk-space related metrics](`529a1239a9`) fixes the table's disk space related metrics. whereas second patch fixes an inefficiency when computing statistics which can be triggered with multiple compaction groups. Closes #12718 * github.com:scylladb/scylladb: table: Fix inefficiency when rebuilding statistics with compaction groups table: Fix disk-space related metrics	2023-02-06 15:11:48 +02:00
Avi Kivity	6bc5536bd8	Revert "Update seastar submodule" This reverts commit `b4559a6992`. It breaks some raft tests. Fixes #12741.	2023-02-06 14:56:44 +02:00
Botond Dénes	5a9f75aac6	Update tools/java submodule * tools/java 1c4e1e7a7d...f0bab7af66 (1): > Fix port option in SSTableLoader	2023-02-06 14:18:52 +02:00
Wojciech Mitros	ef1dac813b	udf: also check reducefunc to confirm that a UDF is not used in a UDA When dropping a UDF we're checking if it's not begin used in any UDAs and fail otherwise. However, we're only checking its state function and final function, and it may also be used as its reduce function. This patch adds the missing checks and a test for them.	2023-02-06 13:02:54 +01:00
Wojciech Mitros	49077dd144	udf: fix dropping UDFs that share names with other UDFs used in UDAs Currently, when dropping a function, we only check if there exist an aggregate that uses a function with the same name as its state function or final function. This may cause the drop to fail even when it's just another UDF with the same name that's used in the aggregate, even when the actual dropped function is not used there. This patch fixes this by checking whether not only the name of the UDA's sfunc and finalfunc, but also their argument types.	2023-02-06 13:02:53 +01:00
Wojciech Mitros	8791b0faf5	pytest: add optional argument for new_function argument types When multiple functions with the same name but different argument types are created, the default drop statement for these functions will fail because it does not include the argument types. With this change, this problem can be worked around by specifying argument types when creating the function, as this will cause the drop statement to include them.	2023-02-06 13:02:19 +01:00
Botond Dénes	8efa9b0904	Merge 'Avoid qctx from view-builder methods of system_keyspace' from Pavel Emelyanov The system_keyspace defines several auxiliary methods to help view_builder update system.scylla_views_builds_in_progress and system.built_views tables. All use global qctx thing. It only takes adding view_builder -> system_keyspace dependency in order to de-static all those helpers and let them use query-processor from it, not the qctx. Closes #12728 * github.com:scylladb/scylladb: system_keysace: De-static calls that update view-building tables storage_service: Coroutinize mark_existing_views_as_built() api: Unset column_famliy endpoints api: Carry sharded<db::system_keyspace> reference over view_builder: Add system_keyspace dependency	2023-02-06 12:44:40 +02:00
Botond Dénes	e247e15ec1	Merge 'Method to create and start task manager task' from Aleksandra Martyniuk In most cases, tasks manager's tasks are started just after they are created. Thus, to reduce boilerplate required for creating and starting tasks, tasks::task_manager::module::make_and_start_task method is added. Repair tasks are modified to use the method where possible. Closes #12729 * github.com:scylladb/scylladb: repair: use tasks::task_manager::module::make_and_start_task for repair tasks tasks: add task_manager::module::make_and_start_task method	2023-02-06 12:38:35 +02:00
Yaniv Kaul	9039b94790	docs: dev - how to test your tests documentation Short paragraph on how to develop tests and ensure they are solid. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes #12746	2023-02-06 12:07:43 +02:00
Avi Kivity	1e6cc9ca61	Merge 'storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops' from Asias He Since `97bb2e47ff` (storage_service: Enable Repair Based Node Operations (RBNO) by default for replace), RBNO was enabled by default for replace ops. After more testing, we decided to enable repair based node operations by default for all node operations. Closes #12173 * github.com:scylladb/scylladb: storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops test: Increase START_TIMEOUT test: Increase max-networking-io-control-blocks storage_service: Check node has left in node_ops_cmd::decommission_done repair: Use remote dc neighbors for everywhere strategy	2023-02-06 10:42:52 +02:00
Botond Dénes	511c0123a2	Merge 'Add compaction module to task manager' from Aleksandra Martyniuk Introduces task manager's compaction module. That's an initial part of integration of compaction with task manager. When fully integrated, task manager will allow user to track compaction operations, check status and progress of each individual one. It will help with creating an asynchronous version of rest api that forces any compaction. Currently, users can see with /task_manager/list_modules api call that compaction is one of the modules accessible through task manager. They won't get any additional information though, since compaction tasks are not created yet. A shared_ptr to compaction module is kept in compaction manager. Closes #12635 * github.com:scylladb/scylladb: compaction: test: pass task_manager to compaction_manager in test environment compaction: create and register task manager's module for compaction tasks: add task_manager constructor without arguments	2023-02-06 09:25:05 +02:00
Botond Dénes	cdd8b0fa35	Merge 'SSTable set improvements' from Raphael "Raph" Carvalho Makes sstable_set::all() interface robust, and introduces sstable_set::size() to avoid copies when retrieving set size. Closes #12716 * github.com:scylladb/scylladb: treewide: Use new sstable_set::size() wherever possible sstables: Introduce sstable_set::size() sstables: Fix fragility of sstable_set::all() interface	2023-02-06 08:24:00 +02:00
Avi Kivity	f73e2c992f	Merge 'Keep range tombstones with rows in memtables and cache' from Tomasz Grabiec This series switches memtable and cache to use a new representation for mutation data, called `mutation_partition_v2`. In this representation, range tombstone information is stored in the same tree as rows, attached to row entries. Each entry has a new tombstone field, which represents range tombstone part which applies to the interval between this entry and the previous one. See docs/dev/mvcc.md for more details about the format. The transient mutation object still uses the old model in order to avoid work needed to adapt old code to the new model. It may also be a good idea to live with two models, since the transient mutation has different requirements and thus different trade-offs can be made. Transient mutation doesn't need to support eviction and strong exception guarantees, so its algorithms and in-memory representation can be simpler. This allows us to incrementally evict range tombstone information. Before this series, range tombstones were accumulated and evicted only when the whole partition entry was evicted. This could lead to inefficient use of cache memory. Another advantage of the new representation is that reads don't have to lookup range tombstone information in a different tree while reading. This leads to simpler and more efficient readers. There are several disadvantages too. Firstly, rows_entry is now larger by 16 bytes. Secondly, update algorithms are more complex because they need to deoverlap range tombstone information. Also, to handle preemption and provide strong exception guarantees, update algorithms may need to allocate sentinel entries, which adds complexity and reduces performance. The memtable reader was changed to use the same cursor implementation which cache uses, for improved code reuse and reducing risk of bugs due to discrepancy of algorithms which deal with MVCC. Remaining work: - performance optimizations to apply_monotonically() to avoid regressions - performance testing - preemption support in apply_to_incomplete (cache update from memtable) Fixes #2578 Fixes #3288 Fixes #10587 Closes #12048 * github.com:scylladb/scylladb: test: mvcc: Extend some scenarios with exhaustive consistency checks on eviction test: mvcc: Extract mvcc_container::allocate_in_region() row_cache, lru: Introduce evict_shallow() test: mvcc: Avoid copies of mutation under failure injection test: mvcc: Add missing logalloc::reclaim_lock to test_apply_is_atomic mutation_partition_v2: Avoid full scan when applying mutation to non-evictable Pass is_evictable to apply() tests: mutation_partition_v2: Introduce test_external_memory_usage_v2 mirroring the test for v1 tests: mutation: Fix test_external_memory_usage() to not measure mutation object footprint tests: mutation_partition_v2: Add test for exception safety of mutation merging tests: Add tests for the mutation_partition_v2 model mutation_partition_v2: Implement compact() cache_tracker: Extract insert(mutation_partition_v2&) mvcc, mutation_partition: Document guarantees in case merging succeeds mutation_partition_v2: Accept arbitrary preemption source in apply_monotonically() mutation_partition_v2: Simplify get_continuity() row_cache: Distinguish dummy insertion site in trace log db: Use mutation_partition_v2 in mvcc range_tombstone_change_merger: Introduce peek() readers: Extract range_tombstone_change_merger mvcc: partition_snapshot_row_cursor: Handle non-evictable snapshots mvcc: partition_snapshot_row_cursor: Support digest calculation mutation_partition_v2: Store range tombstones together with rows db: Introduce mutation_partition_v2 doc: Introduce docs/dev/mvcc.md db: cache_tracker: Introduce insert() variant which positions before existing entry in the LRU db: Print range_tombstone bounds as position_in_partition test: memtable_test: Relax test_segment_migration_during_flush test: cache_flat_mutation_reader: Avoid timestamp clash test: cache_flat_mutation_reader_test: Use monotonic timestamps when inserting rows test: mvcc: Fix sporadic failures due to compact_for_compaction() test: lib: random_mutation_generator: Produce partition tombstone less often test: lib: random_utils: Introduce with_probability() test: lib: Improve error message in has_same_continuity() test: mvcc: mvcc_container: Avoid UB in tracker() getter when there is no tracker test: mvcc: Insert entries in the tracker test: mvcc_test: Do not set dummy::no on non-clustering rows mutation_partition: Print full position in error report in append_clustered_row() db: mutation_cleaner: Extract make_region_space_guard() position_in_partition: Optimize equality check mvcc: Fix version merging state resetting mutation_partition: apply_resume: Mark operator bool() as explicit	2023-02-05 22:33:10 +02:00
Michał Chojnowski	5edf965526	locator: token_metadata: unify get_address_ranges() and get_ranges() get_address_ranges() and get_ranges() perform almost the same computation. They return the same ranges -- the only difference is that get_address_ranges() returns them in unspecified order, while get_ranges() returns them in sorted order. Therefore the result of get_ranges() is also a valid result for get_address_ranges(), and the two functions can be unified to avoid code duplication. This patch does just that.	2023-02-04 22:55:08 +01:00
Michał Chojnowski	9e57b21e0c	locator: token_metadata: get rid of a quadratic behaviour in get_address_ranges() Some callees of update_pending_ranges use the variant of get_address_ranges() which builds a hashmap of all <endpoint, owned range> pairs. For everywhere_topology, the size of this map is quadratic in the number of endpoints, making it big enough to cause contiguous allocations of tens of MiB for clusters of realistic size, potentially causing trouble for the allocator (as seen e.g. in #12724). This deserves a correction. This patch removes the quadratic variant of get_address_ranges() and replaces its uses with its linear counterpart. Refs #10337 Refs #10817 Refs #10836 Refs #10837 Fixes #12724	2023-02-04 22:38:04 +01:00
Aleksandra Martyniuk	f3fa0d21ef	repair: use tasks::task_manager::module::make_and_start_task for repair tasks Use tasks::task_manager::module::make_and_start_task to create and start repair tasks. Delete start_repair_task static function which did this before.	2023-02-04 14:33:17 +01:00
Aleksandra Martyniuk	cb3b6cdc1a	tasks: add task_manager::module::make_and_start_task method In most cases, tasks manager's tasks are started just after they are created. Thus, to reduce boilerplate required for creating and starting tasks, make_and_start_task method is added.	2023-02-04 14:23:51 +01:00
Jan Ciolek	2a5ed115ca	cql/query_options: add a check for missing bind marker name There was a missing check in validation of named bind markers. Let's say that a user prepares a query like: ```cql INSERT INTO ks.tab (pk, ck, v) VALUES (:pk, :ck, :v) ``` Then they execute the query, but specify only values for `:pk` and `:ck`. We should detect that a value for :v is missing and throw an invalid_request_exception. Until now there was no such check, in case of a missing variable invalid `query_options` were created and Scylla crashed. Sadly it's impossible to create a regression test using `cql-pytest` or `boost`. `cql-pytest` uses the python driver, which silently ignores mising named bind variables, deciding that the user meant to send an UNSET_VALUE for them. When given values like `{'pk': 1, 'ck': 2}`, it will automaticaly extend them to `{'pk': 1, 'ck': 2, 'v': UNSET_VALUE}`. In `boost` I tried to use `cql_test_env`, but it only has methods which take valid `query_options` as a parameter. I could create a separate unit tests for the creation and validation of `query_options` but it won't be a true end-to-end test like `cql-pytest`. The bug was found using the rust driver, the reproducer is available in the issue description. Fixes: #12727 Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> Closes #12730	2023-02-04 02:13:34 +02:00
Alejo Sanchez	346d02b477	raft conf error injection for snapshot To trigger snapshot limit behavior provide an error injection to set with one-shot. Note this effectively changes it and there is no revert. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-02-03 22:33:33 +01:00
Pavel Emelyanov	d021aaf34d	system_keysace: De-static calls that update view-building tables There's a bunch of them used by mainly view_builder and also by the API and storage_service. All use global qctx to make its job, now when the callers have main-local sharded<system_keysace> references they can be made non-static. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-03 21:56:54 +03:00
Pavel Emelyanov	e2f51ce43e	storage_service: Coroutinize mark_existing_views_as_built() It's a start-only method. Making it coroutine helps further patching. Also restrict the call to be shard-0 only, it's such anyway but lets the code have less nested coroutinized lambdas. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-03 21:55:10 +03:00
Andrii Patsula	e420dbf10b	service/raft: raft_group_registry: use recent_entries_map to store rate_limits in pinger. Fixes #12309	2023-02-03 19:04:51 +01:00
Andrii Patsula	c95066a410	utils: introduce recent_entries_map datatype to track least recent visited entries.	2023-02-03 19:04:32 +01:00
Pavel Emelyanov	b347a0cf0b	api: Unset column_famliy endpoints The API calls in question will use system keyspace, that starts before (and thus stops after) and nowadays indirectly uses database instance that also starts earlier (and also stops later), so this avoids potential dangling references. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-03 18:59:28 +03:00
Pavel Emelyanov	eac2e453f2	api: Carry sharded<db::system_keyspace> reference over There's the column_family/get_built_indexes call that calls a system keyspace method to fetch data from scylla_views_builds_in_progress table, so the system keyspace reference will be needed in the API handler. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-03 18:57:43 +03:00
Pavel Emelyanov	bbbeba103b	view_builder: Add system_keyspace dependency The view builder updates system.scylla_views_builds_in_progress and .built_views tables and thus needs the system keyspace instance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-03 18:55:58 +03:00
Aleksandra Martyniuk	12789adb95	compaction: test: pass task_manager to compaction_manager in test environment Each instance of compaction manager should have compaction module pointer initialized. All contructors get task_manager reference with which the module is created.	2023-02-03 15:15:11 +01:00
Raphael S. Carvalho	5a784c3c6d	treewide: Use new sstable_set::size() wherever possible That's the preferred alternative because it's zero copy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-03 10:38:04 -03:00
Raphael S. Carvalho	909d1975af	sstables: Introduce sstable_set::size() Preferred aternative to sstable_set->all()->size(), which may involve of copy elements from a single set or multiple ones if compound_sstable_set is used. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-03 10:38:00 -03:00
Asias He	e7d5e508bc	storage_service: Enable Repair Based Node Operations (RBNO) by default for all node ops Since `97bb2e47ff` (storage_service: Enable Repair Based Node Operations (RBNO) by default for replace), RBNO was enabled by default for replace ops. After more testing, we decided to enable repair based node operations by default for all node operations.	2023-02-03 21:15:08 +08:00
Asias He	fc60484422	test: Increase START_TIMEOUT It is observed that CI machine is slow to run the test. Increase the timeout of adding servers.	2023-02-03 21:15:08 +08:00
Aleksandra Martyniuk	47ef689077	compaction: create and register task manager's module for compaction As an initial part of integration of compaction with task manager, compaction module is added. Compaction module inherits from tasks::task_manager::module and shared_ptr to it is kept in compaction manager. No compaction tasks are created yet.	2023-02-03 13:52:30 +01:00
Aleksandra Martyniuk	6233823cc7	tasks: add task_manager constructor without arguments Sometimes, e.g. for tests, we may need to create task_manager without main-specific arguments.	2023-02-03 13:52:30 +01:00
Aleksandra Martyniuk	8cb319030a	test: rest_api: check if repair of system keyspace returns before corresponding task is created	2023-02-03 13:35:13 +01:00
Aleksandra Martyniuk	aab704d255	repair: finish repair immediately on local keyspaces System keyspace is a keyspace with local replication strategy and thus it does not need to be repaired. It is possible to invoke repair of this keyspace through the api, which leads to runtime error since peer_events and scylla_table_schema_history have different sharding logic. For keyspaces with local replication strategy repair_service::do_repair_start returns immediately.	2023-02-03 13:35:13 +01:00
Kamil Braun	61dfc9c10f	Merge 'docs: extend the warning on using "nodetool removenode"' from Anna Stuchlik This PR extends the description of using `nodetool removenode `to remove an unavailable node, as requested in https://github.com/scylladb/scylla-enterprise/issues/2338. Closes #12410 * github.com:scylladb/scylladb: docs: improve the warning and add a comment to update/remove the information in the future doc: extend the information on removing an unavailable node docs: extend the warning on the Remove a Node page	2023-02-03 12:00:17 +01:00
Kamil Braun	d991f71910	test.py: rely on ScyllaCluster.is_dirty flag for recycling clusters `TopologyTest`s (used by `topology/` suite and friends) already relied on the `is_dirty` flag stored in `ScyllaCluster` thanks to `ScyllaClusterManager` (which passes the flag when returning a cluster to the pool). But `PythonTest`s (cql-pytest/ suite) and `CQLApprovalTest`s (cql/ suite) had different ways to decide whether a cluster should be recycled. For example, `PythonTest` would recycle a cluster if `after_test` raised an exception. This depended on a post-condition check made by `after_test`: it would query the number of keyspaces and throw an exception if it was different than when the test started. If the cluster (which for `PythonTest` is always single-node) was dead, this query would fail. However, we modified the behavior of `after_test` in earlier commits - it no longer preforms the post-condition check on dirty clusters. So it's also no longer reliable to use the exception raised by `after_test` to decide that we should recycle the cluster. Unify the behavior of `PythonTest` and `CQLApprovalTest` with what `TopologyTest` does - using the `is_dirty` flag to decide that we should recycle a cluster. Thanks to earlier commits, this flag is set to `True` whenever a test fails, so it should cover most cases where we want to recycle a cluster. (The only case not currently covered is if a non-dirty cluster crashes after we perform the keyspace post-condition check, which seems quite improbable.) Note that this causes us to recycle clusters more often in these tests: previously, when a `PythonTest` or `CQLApprovalTest` failed, but the cluster was still alive and the post-condition check passed, we would use the cluster for the next test. Now we recycle a cluster whenever a test that used it fails.	2023-02-03 11:49:35 +01:00
Kamil Braun	8442cccd37	test/topology: don't drop random_tables keyspace after a failed test After a failed test, the cluster might be down so dropping the random_tables keyspace might be impossible. The cluster will be marked dirty so it doesn't matter that we leave any garbage there. Note: we already drop only if the cluster is not marked as dirty, and we mark the cluster as dirty after a failed test. However, marking the cluster as dirty after a failed test happens at the end of the `manager` fixture and the `random_tables` fixture depends on the `manager` fixture, so at the end of the `random_tables` fixture the cluster still wasn't marked as dirty. Hence the fixture must access the pytest-provided `request` fixture where we store a flag whether the test has failed.	2023-02-03 11:49:35 +01:00
Anna Stuchlik	84e2178fe9	docs: improve the warning and add a comment to update/remove the information in the future	2023-02-03 09:33:07 +01:00
Botond Dénes	c270c305c0	Merge 'Allow entire test suite to run with multiple compaction groups' from Raphael "Raph" Carvalho New test/lib/scylla_test_case.hh, introduced in "tests: Add command line options for Scylla unit tests", allows extension of the command line options provided by Seastar testing framework. It allows all boost tests to process additional options without changing a single line of code. Patch "test: Add x-log2-compaction-groups to Scylla test command line options" builds on that, allowing all test cases to run with N compaction groups. Again, without changing a line of code in the tests. Now all you have to do is: ./build/dev/test/boost/sstable_compaction_test -- --smp 1 --x-log2-compaction-groups 1 ./test.py --mode=dev --x-log2-compaction-groups 1 --verbose And it will run the test cases with as many groups as you wish. ./test.py passes successfully with parameter --x-log2-compaction-groups 1. Closes #12369 * github.com:scylladb/scylladb: test.py: Add option to run scylla tests with multiple compaction groups test: Add x-log2-compaction-groups to Scylla test command line options test: Enable Scylla test command line options for boost tests tests: Add command line options for Scylla unit tests replica: table: Add debug log for number of compaction groups test: sstable_compaction_test: Fix indentation test: sstable_compaction_test: Make it work with compaction groups test: test_bloom_filter: Fix it with multiple compaction groups test: memtable_test: Fix it with multiple compaction groups	2023-02-03 06:35:15 +02:00
Kefu Chai	d2e3a60428	dist/debian: drop unused Makefile variable `job` was introduced back in `782ebcece4`, so we could consume the option specified in DEB_BUILD_OPTIONS environmental variable. but now that we always repackage the artifacts prebuilt in the relocatable package. we don't build them anymore when packaging debian packages. see `9388f3d626` . and `job` is not passed to `ninja` anymore. so, in this change, `job` is removed from debian/rules as well, as it is not used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-03 11:18:51 +08:00
Kefu Chai	75eaee040b	dist/debian: bump up debhelper compatibility level to 10 to silence the warnings from dh tools, like ``` dh: warning: Compatibility levels before 10 are deprecated (level 9 in use) dh_clean dh_clean: warning: Compatibility levels before 10 are deprecated (level 9 in use) ``` see https://manpages.debian.org/testing/debhelper/debhelper-compat-upgrade-checklist.7.en.html for the changes in between v9 and v10, none of them applies to our use case. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-03 11:04:43 +08:00
Raphael S. Carvalho	55a8421e3d	table: Fix inefficiency when rebuilding statistics with compaction groups Whenever any compaction group has its SSTable set updated, table's rebuild_statistics() is called and it inefficiently iterates through SSTable set of all compaction groups. Now each compaction group keeps track of its statistics, such that table's rebuild_statistics() only need to sum them up. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-02 17:10:11 -03:00
Raphael S. Carvalho	529a1239a9	table: Fix disk-space related metrics total disk space used metric is incorrectly telling the amount of disk space ever used, which is wrong. It should tell the size of all sstables being used + the ones waiting to be deleted. live disk space used, by this defition, shouldn't account the ones waiting to be deleted. and live sstable count, shouldn't account sstables waiting to be deleted. Fix all that. Fixes #12717. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-02 16:38:45 -03:00
Raphael S. Carvalho	55cd163392	sstables: Fix fragility of sstable_set::all() interface all() was returning lw_shared_ptr<sstable_list> which allowed caller to modify sstable set content, which will mess up everything. sstable_set is supposed to be only modifed through insert and erase functions. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-02 15:46:08 -03:00
Alejo Sanchez	9ceb6aba81	test/pylib: one-shot error injection helper Existing helper with async context manager only worked for non one-shot error injections. Fix it and add another helper for one-shot without a context manager. Fix tests using the previous helper. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-02-02 16:37:21 +01:00
Kamil Braun	a9dbd89478	test/pylib: mark cluster as dirty after a failed test We don't expect the cluster to be functioning at all after a failed test. The whole cluster might have crashed, for example. In these situations the framework would report multiple errors (one for the actual failure, another for a failed post-condition check because the cluster was down) which would only obscure the report and make debugging harder. It's also not safe in general to reuse the cluster in another test - if the test previous failed, we should not assume that it's in a valid state. Therefore, mark the cluster as dirty after a failed test. This will let us recycle the cluster based on the dirty flag and it will disable post-condition check after a failed test (which is only done on non-dirty clusters). To implement this in topology tests, we use the `pytest_runtest_makereport` hook which executes after a test finishes but before fixtures finish. There we store a test-failed flag in a stash provided by pytest, then access the flag in the `manager` fixture.	2023-02-02 16:35:55 +01:00
Kamil Braun	977375d13f	test: pylib, topology: don't perform operations after test on a dirty cluster `after_test` would count keyspaces and check that the number is the same as before the test started. The `random_tables` fixture after a test would drop the keyspace that it created before the test. These steps are done to ensure that the cluster is ready to be reused for the next steps. If the cluster is dirty, it cannot be reused anyway, so the steps are unnecessary. They might also be impossible in general - a dirty cluster might be completely dead. For example, the attempts to drop a keyspace from `random_tables` would cause confusing errors if a test failed when it tried to restart a node while all nodes were down, making it harder to find the 'real' failure. Therefore don't perform these operations if the cluster is dirty.	2023-02-02 15:59:02 +01:00
Kamil Braun	f4b56cddde	test/pylib: print cluster at the end of test - print the cluster used by the test in `after_test` - if cluster setup fails in `before_test`, print the cluster together with the exception (`after_test` is not executed if `before_test` fails)	2023-02-02 15:59:02 +01:00
Anna Stuchlik	f4c5cdf21b	doc: add the info about the minor versions	2023-02-02 14:16:40 +01:00
Avi Kivity	f5fd0769b2	Merge 'cql3: expr: don't pass empty evaluation_inputs in is_one_of' from Jan Ciołek `evaluation_inputs` is a struct which contains data needed to evaluate expressions - values of columns, bind variables and other data. `is_on_of()` is a function used to to evaluate `IN` restrictions. It checks whether the LHS is one of elements on the RHS list. Generally when evaluating expressions we get the `evaluation_inputs` as an argument and we should pass them along to any functions that evaluate subexpressions. `is_one_of()` got the inputs as an argument, but didn't pass them along to `equal()`, instead it creates new empty `evaluation_inputs{}` and gives that to `equal()`. At first [I thought this was a bug](https://github.com/scylladb/scylladb/pull/12356#discussion_r1084300969) - with missing information there could be a crash if `equal()` tried to evaluate an expression with a `bind_variable`. It turns out that in this particular case `equal()` won't use the `evaluation_inputs` at all. The LHS and RHS passed to it are just constant values, which were already evaluated to serialized bytes before calling `evaluate()`, so there is no bug. It's still better to pass the inputs argument along if possible. If in the future `equal()` required these inputs for some reason, missing inputs could lead to an unexpected crash. I couldn't find any tests that would detect this case, so such a bug could stay undetected until an unhappy user finds it because their cluster crashed. I added some tests to make sure that it's covered from now on. Closes #12701 * github.com:scylladb/scylladb: cql-pytest: test filtering using list with bind variable test/expr_test: test <int_value> IN (123, ?, 456) cql3: expr: don't pass empty evaluation_inputs in is_one_of	2023-02-02 11:40:20 +02:00
Botond Dénes	9efbcfa190	Merge 'test/alternator: tests for Limit parameter of ListStreams operation' from Nadav Har'El The first patch in this series enables a previously-skipped test for what happens with Limit=0. The test passes. The second patch adds an xfailng test for very large Limit. Closes #12625 * github.com:scylladb/scylladb: test/alternator: xfailing test for huge Limit in ListStreams alternator/test: un-skip test of zero Limit in ListStreams	2023-02-02 07:02:28 +02:00
Asias He	6d7b4a896e	test: Increase max-networking-io-control-blocks The number is too low in the test and we saw rpc: Connection is closed error Inrease the number to the default 1000.	2023-02-02 11:11:22 +08:00
Asias He	693d71984f	storage_service: Check node has left in node_ops_cmd::decommission_done In test with ring delay zero, it is possible that when the node_ops_cmd::decommission_done is received, the nodes remained in the cluster haven't learned the LEFT status for the leaving node yet. To guarantee when the decommission restful api returns, all the nodes participated the decommission operation have learned the LEFT status, a check in the node_ops_cmd::decommission_done is added in this patch. After this patch, the decommission tests which start multiple decommission in a loop with ring delay zero in test/topology/test_topology.py passes.	2023-02-02 11:11:22 +08:00
Asias He	e2e5017c54	repair: Use remote dc neighbors for everywhere strategy Consider: - Bootstrap n1 in dc 1 - Create ks with EverywhereStrategy - Bootstrap n2 in dc 2 Since n2 is the first node in dc2, there will be no local dc nodes to sync data from. In this case, n2 should sync data with node in dc 1 even if it is in the remote dc.	2023-02-02 11:10:50 +08:00
Raphael S. Carvalho	e3923a9caf	test.py: Add option to run scylla tests with multiple compaction groups The tests can now optionally run with multiple groups via option --x-log2-compaction-groups. This includes boost tests and the ones which run against either one (e.g. cql) or many instances (e.g. topology). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-01 20:17:16 -03:00
Raphael S. Carvalho	f510cab5f0	test: Add x-log2-compaction-groups to Scylla test command line options Now any boost test can run with multiple compaction groups by default, without any change in the boost test cases whatsoever. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-01 20:14:51 -03:00
Raphael S. Carvalho	3c5afb2d5c	test: Enable Scylla test command line options for boost tests We have enabled the command line options without changing a single line of code, we only had to replace old include with scylla_test_case.hh. Next step is to add x-log-compaction-groups options, which will determine the number of compaction groups to be used by all instantiations of replica::table. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-01 20:14:51 -03:00
Raphael S. Carvalho	a2c60b6cf5	tests: Add command line options for Scylla unit tests Scylla unit tests are limited to command line options defined by Seastar testing framework. For extending the set of options, Scylla unit tests can now include test/lib/scylla_test_case.hh instead of seastar/testing/test_case.hh, which will "hijack" the entry point and will process the command line options, then feed the remaining options into seastar testing entry point. This is how it looks like when asking for help: Scylla tests additional options: --help Produces help message --x-log2-compaction_groups arg (=0) Controls static number of compaction groups per table per shard. For X groups, set the option to log (base 2) of X. Example: Value of 3 implies 8 groups. Running 1 test case... App options: -h [ --help ] show help message --help-seastar show help message about seastar options --help-loggers print a list of logger names and exit --random-seed arg Random number generator seed --fail-on-abandoned-failed-futures arg (=1) Fail the test if there are any abandoned failed futures Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-01 20:14:51 -03:00
Raphael S. Carvalho	8988795b08	replica: table: Add debug log for number of compaction groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-01 20:14:51 -03:00
Raphael S. Carvalho	a7ddedb998	test: sstable_compaction_test: Fix indentation Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-01 20:14:51 -03:00
Raphael S. Carvalho	c455e43f49	test: sstable_compaction_test: Make it work with compaction groups Tests using replica::table::add_sstable_and_update_cache() cannot rely on the sstable being added to a single compaction group, if the test was forced to run with multiple groups. Additionally let's remove try_flush_memtable_to_sstable() which is retricted to a single group, allowing the entire test to now pass with multiple groups. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-01 20:14:51 -03:00
Raphael S. Carvalho	c25d8614a9	test: test_bloom_filter: Fix it with multiple compaction groups With many compaction groups, the data:filter size ratio becomes small with a small number of keys. Test is adjusted to run another check with more keys if efficiency is higher than expected, but not lower. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-01 20:14:51 -03:00
Raphael S. Carvalho	2d2460046b	test: memtable_test: Fix it with multiple compaction groups With compaction groups, automatic flushing may not pick the user table. Fix it by using explicit flush. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-01 20:14:51 -03:00
Botond Dénes	34cdcaffae	reader_concurrency_semaphore: un-bless permits when they become inactive When the memory consumption of the semaphore reaches the configured serialize threshold, all but the blessed permit is blocked from consuming any more memory. This ensures that past this limit, only one permit at a time can consume memory. Such a blessed permit can be registered inactive. Before this patch, it would still retain its blessed status when doing so. This could result in this permit being re-queued for admission if it was evicted in the meanwhile, potentially resulting in a complete deadlock of the semaphore: * admission queue permits cannot be admitted because there is no memory * admitter permits are all queued on memory, as none of them are blessed This patch strips the blessed status from the permit when it is registered as inactive. It also adds a unit test to verify this happens. Fixes: #12603 Closes #12694	2023-02-01 21:02:17 +02:00
Botond Dénes	693c22595a	sstables/sstable: validate_checksums(): force-check EOF EOF is only guarateed to be set if one tried to read past the end of the file. So when checking for EOF, also try to read some more. This should force the EOF flag into a correct value. We can then check that the read yielded 0 bytes. This should ensure that `validate_checksums()` will not falsely declare the validation to have failed. Fixes: #11190 Closes #12696	2023-02-01 20:52:46 +02:00
Nadav Har'El	69517040f7	Merge 'alterator::streams: Sort tables in list_streams to ensure no duplicates' from Calle Wilund Fixes #12601 (maybe?) Sort the set of tables on ID. This should ensure we never generate duplicates in a paged listing here. Can obviously miss things if they are added between paged calls and end up with a "smaller" UUID/ARN, but that is to be expected. Closes #12614 * github.com:scylladb/scylladb: alternator::streams: Special case single table in list_streams alternator::streams: Only sort tables iff limit < # tables or ExclusiveStartStreamArn set alternator::streams: Set default list_streams limit to 100 as per spec alterator::streams: Sort tables in list_streams to ensure no duplicates	2023-02-01 19:47:16 +02:00
Wojciech Mitros	86c61828e6	udt: disallow dropping a user type used in a user function Currently, nothing prevents us from dropping a user type used in a user function, even though doing so may make us unable to use the function correctly. This patch prevents this behavior by checking all function argument and return types when executing a drop type statement and preventing it from completing if the type is referenced by any of them. Closes #12680	2023-02-01 18:53:29 +02:00
Kefu Chai	53366db6c6	build: disable Seastar's io_uring backend again this partially reverts `49157370bc` according the reports in #12173, at least two developers ran into test failures which are correlated with the lastest Seastar change, which enables the io_uring backend by default. they are using linux kernel 6.0.12 and 6.1.7. it's also reported that reverting the the commit of eedca15f16c3b6eae3d3d8af9510624a93f5d186 in seastar helps. that very commit enables the io_uring by default. although we are not able to identify the exact root cause of the failures in #12173 at this moment, to rule out the potential problem of io_uring should help with further investigation. in this change, io_uring backend is disabled when building Seastar. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12689	2023-02-01 17:36:07 +02:00
Jan Ciolek	ed568f3f70	cql-pytest: test filtering using list with bind variable Add tests which test filtering using IN restriction with a list which contains a bind variable. There are other cql-pytest tests which test IN lists with a bind variable, but it looks like they don't do filtering. IN restrictions on primary key columns are handled in a special way to generate the right ranges. These tests hit a different code path as filtering uses `expr::evaluate()`. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-02-01 16:30:09 +01:00
Jan Ciolek	9eb6746a67	test/expr_test: test <int_value> IN (123, ?, 456) Add tests which test evaluating the IN restriction with a list which contains a bind variable. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-02-01 16:29:32 +01:00
Jan Ciolek	286599fe8b	cql3: expr: don't pass empty evaluation_inputs in is_one_of evaluation_inputs is a struct which contains data needed to evaluate expressions - values of columns, bind variables and other data. is_on_of() is a function used to to evaluate IN restrictions. It checks whether the LHS is one of elements on the RHS list. Generally when evaluating expressions we get the evaluation_inputs{} as an argument and we should pass them along to any functions that evaluate subexpressions. is_one_of() got the inputs as an argument, but didn't pass them along to equal(), instead it creates new empty evaluation_inputs{} and gives that to equal(). At first I thought this was a bug - with missing information there could be a crash if equal() tried to evaluate an expression with a bind_variable. It turns out that in this particular case equal() won't use the evaluation_inputs{} at all. The LHS and RHS passed to it are just constant values, which were already evaluated to serialized bytes before calling evaluate(). It's still better to pass the inputs argument along if possible. If in the future equal() required these inputs for some reason, missing inputs could lead to an unexpected crash. I couldn't find any tests that would detect this case, so such a bug could stay undetected until an unhappy user finds it because their cluster crashed. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-02-01 16:20:24 +01:00
Avi Kivity	b4559a6992	Update seastar submodule * seastar 943c09f869...ef24279f03 (6): > Merge 'util/print_safe, reactor: use concept for type constraints and refactory ' from Kefu Chai > Right align the memory diagnostics > Merge 'Add an API for the metrics layer to manipulate metrics dynamically.' from Amnon Heiman > semaphore: assert no outstanding units when moved > build: do not populate package registry by default > build: stop detecting concepts support Closes #12695	2023-02-01 17:19:49 +02:00
Kamil Braun	40142a51d0	test: topology: wait for token ring/group 0 consistency after decommission There was a check for immediate consistency after a decommission operation has finished in one of the tests, but it turns out that also after decommission it might take some time for token ring to be updated on other nodes. Replace the check with a wait. Also do the wait in another test that performs a sequence of decommissions. We won't attempt to start another decommission until every node learns that the previously decommissioned node has left. Closes #12686	2023-02-01 16:49:22 +02:00
Raphael S. Carvalho	1b2140e416	compaction: Fix inefficiency when updating LCS backlog tracker LCS backlog tracker uses STCS tracker for L0. Turns out LCS tracker is calling STCS tracker's replace_sstables() with empty arguments even when higher levels (> 0) only had sstables replaced. This unnecessary call to STCS tracker will cause it to recompute the L0 backlog, yielding the same value as before. As LCS has a fragment size of 0.16G on higher levels, we may be updating the tracker multiple times during incremental compaction, which operates on SSTables on higher levels. Inefficiency is fixed by only updating the STCS tracker if any L0 sstable is being added or removed from the table. This may be fixing a quadratic behavior during boot or refresh, as new sstables are loaded one by one. Higher levels have a substantial higher number of sstables, therefore updating STCS tracker only when level 0 changes, reduces significantly the number of times L0 backlog is recomputed. Refs #12499. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12676	2023-02-01 15:19:07 +02:00
Michael Hollander	5d1e40bc18	Added missing full stop to SimpleSnitch paragraph Closes #12692	2023-02-01 13:21:49 +02:00
Nadav Har'El	132af20057	Merge 'test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests' from Kamil Braun `ScyllaClusterManager` is used to run a sequence of test cases from a single test file. Between two consecutive tests, if the previous test left the cluster 'dirty', meaning the cluster cannot be reused, it would free up space in the pool (using `steal`), stop the cluster, then get a new cluster from the pool. Between the `steal` and the `get`, a concurrent test run (with its own instance of `ScyllaClusterManager` would start, because there was free space in the pool. This resulted in undesirable behavior when we ran tests with `--repeat X` for a large `X`: we would start with e.g. 4 concurrent runs of a test file, because the pool size was 4. As soon as one of the runs freed up space in the pool, we would start another concurrent run. Soon we'd end up with 8 concurrent runs. Then 16 concurrent runs. And so on. We would have a large number of concurrent runs, even though the original 4 runs didn't finish yet. All of these concurrent runs would compete waiting on the pool, and waiting for space in the pool would take longer and longer (the duration is linear w.r.t number of concurrent competing runs). Tests would then time out because they would have to wait too long. Fix that by using the new `replace_dirty` function introduced to the pool. This function frees up space by returning a dirty cluster and then immediately takes it away to be used for a new cluster. Thanks to this, we will only have at most as many concurrent runs as the pool size. For example with --repeat 8 and pool size 4, we would run 4 concurrent runs and start the 5th run only when one of the original 4 runs finishes, then the 6th run when a second run finishes and so on. The fix is preceded by a refactor that replaces `steal` with `put(is_dirty=True)` and a `destroy` function passed to the pool (now the pool is responsible for stopping the cluster and releasing its IPs). Fixes #11757 Closes #12549 * github.com:scylladb/scylladb: test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests test/pylib: pool: introduce `replace_dirty` test/pylib: pool: replace `steal` with `put(is_dirty=True)`	2023-02-01 12:37:39 +02:00
Anna Stuchlik	b346778ae8	doc: add the missing sudo command	2023-02-01 10:43:39 +01:00
Nadav Har'El	681a066923	test/pylib: put UNIX-domain socket in /tmp The "cluster manager" used by the topology test suite uses a UNIX-domain socket to communicate between the cluster manager and the individual tests. The socket is currently located in the test directory but there is a problem: In Linux the length of the path used as a UNIX-domain socket address is limited to just a little over 100 bytes. In Jenkins run, the test directory names are very long, and we sometimes go over this length limit and the result is that test.py fails creating this socket. In this patch we simply put the socket in /tmp instead of the test directory. We only need to do this change in one place - the cluster manager, as it already passes the socket path to the individual tests (using the "--manager-api" option). Tested by cloning Scylla in a very long directory name. A test like ./test.py --mode=dev test_concurrent_schema fails before this patch, and passes with it. Fixes #12622 Closes #12678	2023-02-01 12:37:35 +03:00
Botond Dénes	325246ab2a	Merge 'doc: fix the service name from "scylla-enterprise-server" "to "scylla-server"' from Anna Stuchlik Related https://github.com/scylladb/scylladb/issues/12658. This issue fixes the bug in the upgrade guides for the released versions. Closes #12679 * github.com:scylladb/scylladb: doc: fix the service name in the upgrade guide for patch releases versions 2022 doc: fix the service name in the upgrade guide from 2021.1 to 2022.1	2023-02-01 12:37:35 +03:00
Anna Stuchlik	2be131da83	doc: fixes https://github.com/scylladb/scylladb/issues/12672 , fix the redirects to the Cloud docs Closes #12673	2023-02-01 12:37:35 +03:00
Botond Dénes	d8073edbb7	Merge 'cql3, locator: call fmt::format_to() explicitly and include used headers' from Kefu Chai these fixes address the FTBFS of scylla with GCC-13. Closes #12669 * github.com:scylladb/scylladb: cql3/stats: include the used header. cql3, locator: call fmt::format_to() explicitly	2023-02-01 12:37:35 +03:00
Pavel Emelyanov	d065f9f82e	sstables: The generation_type is not formattable If TOC writing hits TOC file conflict it tries to throw an exception with sstable generation in it. However, generation_type is not formattable at all, let alone the {:d} option.pick This bug generates an obscure 'fmt::v9::format_error (invalid type specifier)' error in unknown location making the debugging hard. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12671	2023-02-01 12:37:35 +03:00
Kefu Chai	186ceea009	cql3/selection: construct string_view using char* not size before this change, we construct a sstring from a comma statement, which evaluates to the return value of `name.size()`, but what we expect is `sstring(const char, size_t)`. in this change instead of passing the size of the string_view, both its address and size are used * `std::string_view` is constructed instead of sstring, for better performance, as we don't need to perform a deep copy the issue is reported by GCC-13: ``` In file included from cql3/selection/selectable.cc:11: cql3/selection/field_selector.hh:83:60: error: ignoring return value of function declared with 'nodiscard' attribute [-Werror,-Wunused-result] auto sname = sstring(reinterpret_cast<const char*>(name.begin(), name.size())); ^~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12666	2023-02-01 12:37:35 +03:00
David Garcia	616bf26422	docs: add opensource flag Closes #12656	2023-02-01 12:37:35 +03:00
Anna Stuchlik	e81b586d6a	Merge branch 'scylladb:master' into anna-pinning-workaround	2023-02-01 10:36:44 +01:00
Anna Stuchlik	11a59bcc76	doc: fix the service name in the upgrade guide for patch releases versions 2022	2023-01-31 11:04:21 +01:00
Anna Stuchlik	71ae644d40	doc: fix the service name in the upgrade guide from 2021.1 to 2022.1	2023-01-31 10:46:44 +01:00
Kefu Chai	58b4dc5b9a	cql3/stats: include the used header. otherwise `uint64_t` won't be found when compiling with GCC-13. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-01-30 21:50:23 +08:00
Kefu Chai	ccc03dd1ec	cql3, locator: call fmt::format_to() explicitly since format_to() is defined included by both fmt and std namepaces, without specifying which one to use, we'd fail to build with the standard library which implements std::format_to(). yes, we are `using namespace std` somewhere. this change should address the FTBFS with GCC-13. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-01-30 21:50:11 +08:00
Warren Krewenki	8655a8be19	docs: Update suggested AWS instance types in benchmark tips The list of suggested instances had a misspelling of c5d, and didn't include the i4i instances recommended by https://www.scylladb.com/2022/05/09/scylladb-on-the-new-aws-ec2-i4i-instances-twice-the-throughput-lower-latency/ Closes #12664	2023-01-30 14:10:18 +02:00
Botond Dénes	c927eea1d5	Merge 'table: trim ranges for compaction group cleanup' from Benny Halevy This series contains the following changes for trimming the ranges passed to cleanup a compaction group to the compaction group owned token_range. table: compaction_group_for_token: use signed arithmetic Fixes #12595 table: make_compaction_groups: calculate compaction_group token ranges table: perform_cleanup_compaction: trim owned ranges on compaction_group boundaries Fixes #12594 Closes #12598 * github.com:scylladb/scylladb: table: perform_cleanup_compaction: trim owned ranges on compaction_group boundaries table: make_compaction_groups: calculate compaction_group token ranges dht: range_streamer: define logger as static	2023-01-30 13:11:28 +02:00
Anna Stuchlik	64cc4c8515	docs: fixes https://github.com/scylladb/scylladb/issues/12654 , update the links to the Download Center Closes #12655	2023-01-30 12:45:20 +02:00
Michał Chojnowski	fa7e904cd6	commitlog: fix total_size_on_disk accounting after segment file removal Currently, segment file removal first calls `f.remove_file()` and does `total_size_on_disk -= f.known_size()` later. However, `remove_file()` resets `known_size` to 0, so in effect the freed space in not accounted for. `total_size_on_disk` is not just a metric. It is also responsible for deciding whether a segment should be recycled -- it is recycled only if `total_size_on_disk - known_size < max_disk_size`. Therefore this bug has dire performance consequences: if `total_size_on_disk - known_size` ever exceeds `max_disk_size`, the recycling of commitlog segments will stop permanently, because `total_size_on_disk - known_size` will never go back below `max_disk_size` due to the accounting bug. All new segments from this point will be allocated from scratch. The bug was uncovered by a QA performance test. It isn't easy to trigger -- it took the test 7 hours of constant high load to step into it. However, the fact that the effect is permanent, and degrades the performance of the cluster silently, makes the bug potentially quite severe. The bug can be easily spotted with Prometheus as infinitely rising `commitlog_total_size_on_disk` on the affected shards. Fixes #12645 Closes #12646	2023-01-30 12:20:04 +02:00
Botond Dénes	71ad0dff2b	test/lib/sstable_utils: remove now unused token_generation_for_shard() and friends	2023-01-30 05:03:42 -05:00
Botond Dénes	a03c11234d	test/lib/simple_schema: remove now unused make_keys() and friends	2023-01-30 05:03:42 -05:00
Botond Dénes	4ad3ba52b0	test: migrate to tests::generate_partition_key[s]() Use the newly introduced key generation facilities, instead of the the old inflexible alternatives and hand-rolled code. Most of the migrations are mechanic, but there are two tests that were tricky to migrate: * sstable_compaction_test.sstable_run_based_compaction_test * sstable_mutation_test.test_key_count_estimation These two tests seems to depend on generated keys all being of the same size. This makes some sense in the case of the key count estimation test, but makes no sense at all to me in the case of the sstable run test.	2023-01-30 05:03:42 -05:00
Botond Dénes	84c94881b3	test/lib/test_services: add table_for_tests::make_default_schema() Creating the default schema, used in the default constructor of table_for_tests. Allows for getting the default schema without creating an instance first.	2023-01-30 05:03:42 -05:00
Botond Dénes	61f28d3ab2	test/lib: add key_utils.hh Contains methods to generate partition and clustering keys. In the case of the former, one can specify the shard to generate keys for. We currently have some methods to generate these but they are not generic. Therefore the tests are littered by open-coded variants. The methods introduced here are completely generic: they can generate keys for any schema.	2023-01-30 05:03:42 -05:00
Anna Stuchlik	0294b426b9	doc: replace the reduntant link with an alternative way to install a non-latest version	2023-01-30 10:01:17 +01:00
Botond Dénes	04ca710a95	test/lib/random_schema.hh: value_generator: add min_size_in_bytes Allow caller to specify the minimum size in bytes of the generated value. Only really works with string-like types (and collections of these). Also fixed max size enforcement for strings: before this patch, the provided max size was dividied by wide string size, instead of the char width of the actual string type the value is generated for.	2023-01-30 01:11:31 -05:00
Avi Kivity	5d914adcef	Merge 'view: row_lock: lock_ck: find or construct row_lock under partition lock' from Benny Halevy Since we're potentially searching the row_lock in parallel to acquiring the read_lock on the partition, we're racing with row_locker::unlock that may erase the _row_locks entry for the same clustering key, since there is no lock to protect it up until the partition lock has been acquired and the lock_partition future is resolved. This change moves the code to search for or allocate the row lock _after_ the partition lock has been acquired to make sure we're synchronously starting the read/write lock function on it, without yielding, to prevent this use-after-free. This adds an allocation for copying the clustering key in advance that wasn't needed before if the lock for it was already found, but the view building is not on the hot path so we can tolerate that. This is required on top of `5007ded2c1` as seen in https://github.com/scylladb/scylladb/issues/12632 which is closely related to #12168 but demonstrates a different race causing use-after-free. Fixes #12632 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12639 * github.com:scylladb/scylladb: view: row_lock: lock_ck: try_emplace row_lock entry view: row_lock: lock_ck: find or construct row_lock under partition lock	2023-01-29 18:38:14 +02:00
Warren Krewenki	2b7a7e52f4	docs: Missing closing quote in example query Closes #12663	2023-01-29 11:50:11 +02:00
Tomasz Grabiec	c9c476afd7	test: mvcc: Extend some scenarios with exhaustive consistency checks on eviction	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	80de99cb1b	test: mvcc: Extract mvcc_container::allocate_in_region()	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	7bb975eb22	row_cache, lru: Introduce evict_shallow() Will be used by MVCC tests which don't want (can't) deal with the row_cache as the container but work with the partition_entry directly. Currently, rows_entry::on_evicted() assumes that it's embedded in row_cache and would segfault when trying to evict the contining partition entry which is not embedded in row_cache. The solution is to call evict_shallow() from mvcc_tests, which does not attempt to evict the containing partition_entry.	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	f2832046e9	test: mvcc: Avoid copies of mutation under failure injection Speeds up the test a bit because we avoid the copy when converting to mutation_partition_v2 in apply().	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	b8980f68f0	test: mvcc: Add missing logalloc::reclaim_lock to test_apply_is_atomic	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	d02d668777	mutation_partition_v2: Avoid full scan when applying mutation to non-evictable For non-evictable snapshots all ranges are continuous so there is no need to apply the continuity flag to the previous interval if the source mutation has the interval marked as continuous. Without this, applying a single row mutation to a memtable would involve scanning exisiting version for the range before the row's key. This makes population quadratic. This is severed by the fact that this scan will happen in the background if preempted, which exposes a scheduling problem. The mutation cleaner worker which merges versions in the background will not keep up with the incoming writes. This will lead to explosion of partition versions, which makes reads (e.g. memtable flush) very slow. The read will have to refresh the iterator heap, which has an iterator for each version, across every preemption point, because cleaning invalidates iterators. The same could happen before the v2 representation, but for much less typical workloads, e.g. applying lots of mutations with a single range tombstone covering existing rows. The problem was hit in index_with_paging_test in debug mode. It's less likely to happen in release mode where preemption is not triggered as often.	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	bc35fa7696	Pass is_evictable to apply()	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	2b5e7a684b	tests: mutation_partition_v2: Introduce test_external_memory_usage_v2 mirroring the test for v1	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	81b1b2ee55	tests: mutation: Fix test_external_memory_usage() to not measure mutation object footprint The test measured copying of the mutation object, but verified the measurement against mutation_partition::external_memory_usage(). So anything allocated on the mutation object level would cause the test to (incorrectly) fail. Fix that by copying only the mutation_partition part. Currently not a problem, because the partition_key is stored in the in-line storage. Would become a problem once inline storage is reduced.	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	f172336b32	tests: mutation_partition_v2: Add test for exception safety of mutation merging	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	919ff433d1	tests: Add tests for the mutation_partition_v2 model	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	cec9b2d114	mutation_partition_v2: Implement compact() For convenience, will be used in unit tests.	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	4317999ca4	cache_tracker: Extract insert(mutation_partition_v2&)	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	c7f7377ea3	mvcc, mutation_partition: Document guarantees in case merging succeeds It's not obvious that invariants for partial merge do not hold for a completed merge. This is due to the fact that an empty source partition, which is always empty after merge, is always fully continuous.	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	8ae78ffebd	mutation_partition_v2: Accept arbitrary preemption source in apply_monotonically() Will be useful in testing to exhaustivaly test preemption scenarios.	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	8883ac30cf	mutation_partition_v2: Simplify get_continuity()	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	d9e27abe87	row_cache: Distinguish dummy insertion site in trace log	2023-01-27 21:56:31 +01:00
Tomasz Grabiec	026f8cc1e7	db: Use mutation_partition_v2 in mvcc This patch switches memtable and cache to use mutation_partition_v2, and all affected algorithms accordingly. The memtable reader was changed to use the same cursor implementation which cache uses, for improved code reuse and reducing risk of bugs due to discrepancy of algorithms which deal with MVCC. Range tombstone eviction in cache has now fine granularity, like with rows. Fixes #2578 Fixes #3288 Fixes #10587	2023-01-27 21:56:28 +01:00
Tomasz Grabiec	ccf3a13648	range_tombstone_change_merger: Introduce peek() Returns the current tombstone without affecting state.	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	42f5a7189d	readers: Extract range_tombstone_change_merger	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	6b7473be53	mvcc: partition_snapshot_row_cursor: Handle non-evictable snapshots This is a prerequisite for using the cursor in memtable readers. Non-evictable snapshots are those which live in memtables. Unlike evictable snapshots, they don't have a dummy entry at position after all clustering rows. In evictable snapshots, lookup always finds an entry, not so with non-evictable snapshots. The cursor was not prepared for this case, this patch handles it.	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	091ad8f6ee	mvcc: partition_snapshot_row_cursor: Support digest calculation Prerequisite for using in memtable reader.	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	195b40315a	mutation_partition_v2: Store range tombstones together with rows This patch changes mutation_partition_v2 to store range tombstone information together with rows. This mainly affects the version merging algorithm, mutation_partition_v2::apply_monotonically(). Continuity setting no longer can drop dummy entry unconditionally since it may be a boundary of a range tombstone. Memtable/cache is not switched yet. Refs #10587 Refs #3288	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	7e6056b3cc	db: Introduce mutation_partition_v2 Intended to be used in memtable/cache, as opposed to the old mutation_partition which will be intended to be used as temporary object. The two will have different trade-offs regarding memory efficiency and algorithms. In this commit there is no change in logic, the class is mostly copied. Some methods which are not needed on the v2 model were removed from the interface. Logic changes will be introduced in later commits.	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	806f698272	doc: Introduce docs/dev/mvcc.md This extracts information which was there in row_cache.md, but is relevant to MVCC in general. It also makes adaptations and reflects the upcoming changes in this series related to switching to the new mutation_partition_v2 model: - continuity in evictable snapshots can now overlap. This is needed to represent range tombstone information, which is linked to continuity information. - description of range tombstone representation was added	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	27882ff19e	db: cache_tracker: Introduce insert() variant which positions before existing entry in the LRU	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	a574a1cc4e	db: Print range_tombstone bounds as position_in_partition It's the standard now which replaced bound_view. Will be consistent with how range tombstone bounds are represented in mutation_partition_v2 (as rows_entry::position()).	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	40719c600c	test: memtable_test: Relax test_segment_migration_during_flush Partition version merging can now insert sentinels, which may temporarily increase unspooled memory. It is no longer true that unspooled monotonically decreases, which the test verified. Relax it, and only verify that unspooled is smaller than real dirty.	2023-01-27 19:15:39 +01:00
Tomasz Grabiec	31bcc3b861	test: cache_flat_mutation_reader: Avoid timestamp clash api::new_timestamp() is not monotonic. In test_single_row_and_tombstone_not_cached_single_row_range1, we generate a deletion and an insertion in the deleted reange. If they get the same timestamp, the inserted row will be covered. This will surface after cache starts to compact rows with range tombstones.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	25683449e4	test: cache_flat_mutation_reader_test: Use monotonic timestamps when inserting rows When inserting range tombstones, the test uses api::new_timestamp(), but when inserting rows, it uses a fixed timestamp of 1. This will be problematic when rows get compacted with range tombstone, all rows would get compacted away, which is not expected by the test. To fix this, let's use the same timestamp source as range tombstones. This way rows will get a later timestamp.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	71057412ed	test: mvcc: Fix sporadic failures due to compact_for_compaction() compact_for_compaction() will perform cell expiration based on gc_clock::now(), which introduces sporadic mismatches due to expiry status of a row marker. Drop this, we can rely on compaction done by is_equal_to_compacted()	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	f908713290	test: lib: random_mutation_generator: Produce partition tombstone less often This tombstone has a high chance of obliterating all data, which will make tests which involve partition version merging not very interesting. The result will be an empty partition with a tombstone. Reduce its frequency, so that in MVCC there is a significant chance of having live data in the combined entry where individual versions come from the generator.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	3bf8052be4	test: lib: random_utils: Introduce with_probability()	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	c386874e18	test: lib: Improve error message in has_same_continuity()	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	08f68c5f20	test: mvcc: mvcc_container: Avoid UB in tracker() getter when there is no tracker	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	5aa8cb56a8	test: mvcc: Insert entries in the tracker evictable snapshots must have all entries added to the tracker. Partition version merging assumes this. Before this was benign, but will start to trigger asserts in mutation_partition_v2.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	9d38997971	test: mvcc_test: Do not set dummy::no on non-clustering rows This will trigger an assert in apply_monotonically() later in the series, where this row would be merged with a dummy at the same position. This row must not be marked as non-dummy, there is an assumption that non-clustering positions are all dummies. There can't be two entries with the same position an a different dummy status.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	f79072638d	mutation_partition: Print full position in error report in append_clustered_row() std::prev(i) can be dummy.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	6a305666a4	db: mutation_cleaner: Extract make_region_space_guard() Will be used in more places.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	833e2a8d30	position_in_partition: Optimize equality check We can avoid key comparsion if bound weights don't match.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	95b509afcd	mvcc: Fix version merging state resetting Upon entry to merge_partition_versions() we skip over versions which are not referenced in order to start merging from the oldest unreferenced version, which is good for performance. Later, we reallocate version merging state if we detected such a move, so that we don't reuse state allocated for a different version pair than before. This check was using version_no, the counter of skipped versions to detect this. But this only makes sense if each merge_partition_versions() uses the same version pointer as a base. In fact it doesn't, if we skip, we advance _version, so the skip is persisted in the snapshot. It's enough to discard the version merging state when we do that. This shouldn't have effect on existing code base, since there is currently no way to trigger the version skipping loggic.	2023-01-27 19:15:38 +01:00
Tomasz Grabiec	1c4b5b0b6b	mutation_partition: apply_resume: Mark operator bool() as explicit	2023-01-27 19:15:38 +01:00
Anna Stuchlik	70480184ab	doc: add the link to the FAQ about pinning to the patch upgrade guides 2022 and 2022	2023-01-27 18:06:54 +01:00
Anna Stuchlik	31515f7604	doc: add a FAQ with a workaround to install a non-latest ScyllaDB version on Debian and Ubuntu	2023-01-27 17:49:00 +01:00
Botond Dénes	84a69b6adb	db/view/view_update_check: check_needs_view_update_path(): filter out non-member hosts We currently don't clean up the system_distributed.view_build_status table after removed nodes. This can cause false-positive check for whether view update generation is needed for streaming. The proper fix is to clean up this table, but that will be more involved, it even when done, it might not be immediate. So until then and to be on the safe side, filter out entries belonging to unknown hosts from said table. Fixes: #11905 Refs: #11836 Closes #11860	2023-01-27 17:12:45 +03:00
Botond Dénes	e2c9cdb576	mutation_compactor: only pass consumed range-tombstone-change to validator Currently all consumed range tombstone changes are unconditionally forwarded to the validator. Even if they are shadowed by a higher level tombstone and/or purgable. This can result in a situation where a range tombstone change was seen by the validator but not passed to the consumer. The validator expects the range tombstone change to be closed by end-of-partition but the end fragment won't come as the tombstone was dropped, resulting in a false-positive validation failure. Fix by only passing tombstones to the validator, that are actually passed to the consumer too. Fixes: #12575 Closes #12578	2023-01-27 14:03:45 +01:00
Nadav Har'El	b99b83acdd	docs/alternator: fix links to open issues The docs/alternator/compatibility.md file links to various open issues on unimplemented features. One of the links was to an already-closed issue. Replace it by a link to an open issue that was missing. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12649	2023-01-27 14:29:57 +02:00
Pavel Emelyanov	1f9f819c8c	table: Remove unused column_family_directory() overload There's another one that accepts explicit basedir first argument and that's used by the rest of the code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12643	2023-01-27 14:17:41 +02:00
Nadav Har'El	f873884b50	test/alternator: unskip test which works on modern Scylla We had one test test_gsi.py::test_gsi_identical that didn't work on KA/LA sstables due to #6157, so it was skipped. Today, Scylla no longer supports writing these old sstable formats, so the test can never find itself running on these versions, so should pass. And indeed it does, and the "skip" marker can be removed. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12651	2023-01-27 14:10:07 +02:00
Botond Dénes	d358d4d9e9	Merge 'Configure sstable_test_env with tempdir' from Pavel Emelyanov Today's sstable_test_env starts with a default-configured db::config and, thus, sstables_manager. Test cases that run in this env always create a tempdir to store sstable files in on their own. Next patching makes sstable-manager and friends fully control the data-dir path in order to support object storage for sstables in a nice way, and this behavior of tests upsets this ongoing work. Said that, this PR configures sstable_test_env with a tempdir and pins down the cases using it to stick to that directory, rather than to the custom one. Closes #12641 * github.com:scylladb/scylladb: test: Use tempdir from sstable_test_env test: Add tmpdir to sstable test env test: Keep db::config as unique pointer	2023-01-27 13:59:12 +02:00
Avi Kivity	df09bf2670	tools: toolchain: dbuild: pass NOFILE limit from host to container The leak sanitizer has a bug [1] where, if it detects a leak, it forks something, and before that, it closes all files (instead of using close_range like a good citizen). Docker tends to create containers with the NOFILE limit (number of open files) set to 1 billion. The resulting 1 billion close() system calls is incredibly slow. Work around that problem by passing the host NOFILE limit. [1] https://github.com/llvm/llvm-project/issues/59112 Closes #12638	2023-01-27 13:56:35 +02:00
Benny Halevy	d2893f93cb	view: row_lock: lock_ck: try_emplace row_lock entry Use same method as the two-level lock at the partition level. try_emplace will either use an existing entry, if found, or create a new entry otherwise. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-27 13:51:48 +02:00
Benny Halevy	4b5e324ecb	view: row_lock: lock_ck: find or construct row_lock under partition lock Since we're potentially searching the row_lock in parallel to acquiring the read_lock on the partition, we're racing with row_locker::unlock that may erase the _row_locks entry for the same clustering key, since there is no lock to protect it up until the partition lock has been acquired and the lock_partition future is resolved. This change moves the code to search for or allocate the row lock _after_ the partition lock has been acquired to make sure we're synchronously starting the read/write lock function on it, without yielding, to prevent this use-after-free. This adds an allocation for copying the clustering key in advance even if a row_lock entry already exists, that wasn't needed before. It only us slows down (a bit) when there is contention and the lock already existed when we want to go locking. In the fast path there is no contention and then the code already had to create the lock and copy the key. In any case, the penalty of copying the key once is tiny compared to the rest of the work that view updates are doing. This is required on top of `5007ded2c1` as seen in https://github.com/scylladb/scylladb/issues/12632 which is closely related to #12168 but demonstrates a different race causing use-after-free. Fixes #12632 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-27 13:51:46 +02:00
Kamil Braun	fa9cf81af2	test: topology: verify that group 0 and token ring are consistent After topology changes like removing a node, verify that the set of group 0 members and token ring members is the same. Modify `get_token_ring_host_ids` to only return NORMAL members. The previous version which used the `/storage_service/host_id` endpoint might have returned non-NORMAL members as well. Fixes: #12153 Closes #12619	2023-01-27 14:21:14 +03:00
Avi Kivity	f719de3357	Update seastar submodule * seastar d41af8b59...943c09f86 (20): > reactor: disable io_uring on older kernels if not enough lockable memory is available > demos/tcp_sctp_client_demo: use user-defined literal for sizes > core/units: add user-defined literal for IEC prefixes > core/units: include what we use > coroutine/exception: do not include core/coroutine.hh > seastar/coroutine: drop std-coroutine.hh > core/bitops.hh: add type constraits to templates > apps/iotune: s/condition == false/!condition/ > core/metrics_api: s/promehteus/prometheus/ > reactor: make io_uring the default backend if available > tests: connect_test: use 127.0.0.1 for connect refused test > reactor: use aio to implement reactor_backend_uring::read() > future: schedule: get_available_state_ref under SEASTAR_DEBUG > rpc: client_info: add retrieve_auxiliary_opt > Merge 'Make http requests with content-length header and generated body' from Pavel Emelyanov > Merge 'Ensure logger doesn't allocate' from Travis Downs > http, httpd: optimize header field assignment > sstring: operator<< std::unordered_map: delete stray space char > Dump memory diagnostics at error level on abort > Fix CLI help for memory diagnostics dump Closes #12650	2023-01-26 22:19:24 +02:00
Anna Stuchlik	6ef33f8aae	doc: reorganize the content on the Upgrade ScyllaDB page	2023-01-26 13:37:27 +01:00
Botond Dénes	d7ed92bb42	Merge 'Reduce the number of table::make_sstable() overloads' from Pavel Emelyanov There are several helpers to make an sstable for the table and two with most of the arguments are only used by tests. This PR leaves table with just one arg-less call thus making it easier to patch further. Closes #12636 * github.com:scylladb/scylladb: table: Shrink sstables making API tests: Use sstables manager to make sstables distributed_loader: Add helpers to make sstables for reshape/reshard	2023-01-26 14:25:21 +02:00
Anna Stuchlik	29536cb064	doc: improve the overview of the upgrade procedure (apply feedback)	2023-01-26 13:09:08 +01:00
Kamil Braun	5eadea301e	Merge 'pytest: start after ungraceful stop' from Alecco If a server is stopped suddenly (i.e. not graceful), schema tables might be in inconsistent state. Add a test case and enable Scylla configuration option (force_schema_commit_log) to handle this. Fixes #12218 Closes #12630 * github.com:scylladb/scylladb: pytest: test start after ungraceful stop test.py: enable force_schema_commit_log	2023-01-26 12:08:33 +01:00
Kamil Braun	3eabe04f5d	test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests `ScyllaClusterManager` is used to run a sequence of test cases from a single test file. Between two consecutive tests, if the previous test left the cluster 'dirty', meaning the cluster cannot be reused, it would put the old cluster to the pool with `is_dirty=True`, then get a new cluster from the pool. Between the `put` and the `get`, a concurrent test run (with its own instance of `ScyllaClusterManager`) would start, because there was free space in the pool. This resulted in undesirable behavior when we ran tests with `--repeat X` for a large `X`: we would start with e.g. 4 concurrent runs of a test file, because the pool size was 4. As soon as one of the runs freed up space in the pool, we would start another concurrent run. Soon we'd end up with 8 concurrent runs. Then 16 concurrent runs. And so on. We would have a large number of concurrent runs, even though the original 4 runs didn't finish yet. All of these concurrent runs would compete waiting on the pool, and waiting for space in the pool would take longer and longer (the duration is linear w.r.t number of concurrent competing runs). Tests would then time out because they would have to wait too long. Fix that by using the new `replace_dirty` function introduced to the pool. This function frees up space by returning a dirty cluster and then immediately takes it away to be used for a new cluster. Thanks to this, we will only have at most as many concurrent runs as the pool size. For example with --repeat 8 and pool size 4, we would run 4 concurrent runs and start the 5th run only when one of the original 4 runs finishes, then the 6th run when a second run finishes and so on. Fixes #11757	2023-01-26 11:58:00 +01:00
Kamil Braun	b5ef57ecc2	test/pylib: pool: introduce `replace_dirty` Used to atomically return a dirty object to the pool and then use the space freed by this object to get another object. Unlike `put(is_dirty=True)` followed by `get`, a concurrent waiter cannot take away our space from us. A piece of `get` was refactored to a private function `_build_and_get`, this piece is also used in `replace_dirty`.	2023-01-26 11:58:00 +01:00
Kamil Braun	858803cc2c	test/pylib: pool: replace `steal` with `put(is_dirty=True)` The pool usage was kind of awkward previously: if the user of a pool decided that a previously borrowed object should no longer be used, it was their responsibility to destroy the object (releasing associated resources and so on) and then call `steal()` on the pool to free space for a new object. Change the interface. Now the `Pool` constructor obtains a `destroy` function additionally to the `build` function. The user calls the function `put` to return both objects that are still usable and those aren't. For the latter, they set `is_dirty=True`. The pool will 'destroy' the object with the provided function, which could mean e.g. releasing associated resources. For example, instead of: ``` if self.cluster.is_dirty: self.clusters.stop() self.clusters.release_ips() self.clusters.steal() else: self.clusters.put(self.cluster) ``` we can now use: ``` self.clusters.put(self.cluster, is_dirty=self.cluster.is_dirty) ``` (assuming that `self.clusters` is a pool constructed with a `destroy` function that stops the cluster and releases its IPs.) Also extend the interface of the context manager obtained by `instance()` - the user must now pass a flag `dirty_on_exception`. If the context manager exists due to an exception and that flag was `True`, the object will be considered dirty. The dirty flag can also be set manually on the context manager. For example: ``` async with (cm := pool.instance(dirty_on_exception=True)) as server: cm.dirty = await run_test(test, server) # It will also be considered dirty if run_test throws an exception ```	2023-01-26 11:58:00 +01:00
Pavel Emelyanov	dd307d8a42	test: Use tempdir from sstable_test_env The test cases in sstable_directory_test use a temporary directory that differs from the one sstables manager starts over. Fix that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-01-26 11:47:06 +03:00
Pavel Emelyanov	0c3799db71	test: Add tmpdir to sstable test env This adds the test/lib's tmpdir instance _and_ configures the data_file_directories with this path. This makes sure sstables manager and the rest of the test use the same directory for sstables. For now it doesn't change anything, but helps next patching. (A neat side effect of this change is that sstable_test_env is now configured the same way as cql_test_env does) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-01-26 11:47:06 +03:00
Pavel Emelyanov	9f4efd6b6f	table: Shrink sstables making API Currently there are four helpers, this patch makes it just two and one of them becomes private the table thus making the API small and neat (and easy to patch further). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-01-26 10:47:39 +03:00
Pavel Emelyanov	fd559f3b81	tests: Use sstables manager to make sstables This test uses two many-args helpers from table calss to create sstables with desired parameters. The table API in question is not used by any other code but these few places, to it's better to open-code it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-01-26 10:47:39 +03:00
Pavel Emelyanov	bfddfb8927	distributed_loader: Add helpers to make sstables for reshape/reshard This kills two birds with one stone. First, it factors out (quite a lot of) common arguments that are passed to table.make_sstable(). Second, it makes the helpers call sstable manager with extended args making it possible to remove those wrappers from table class later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-01-26 10:47:39 +03:00
Botond Dénes	ba26770376	tools/schema_loader: data_dictionary_impl:try_find_table(): also check ks name Although the number of keyspaces should mostly be 1 here, and thus the chance of two tables from different keyspaces colliding is miniscule, it is not zero. Better be safe than sorry, so match the keyspace name too when looking up a table. Closes #12627	2023-01-25 22:04:07 +02:00
Raphael S. Carvalho	87ee547120	table: Fix quadratic behavior when inserting sstables into tracker on schema change Each time backlog tracker is informed about a new or old sstable, it will recompute the static part of backlog which complexity is proportional to the total number of sstables. On schema change, we're calling backlog_tracker::replace_sstables() for each existing sstable, therefore it produces O(N ^ 2) complexity. Fixes #12499. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12593	2023-01-25 21:57:33 +02:00
Botond Dénes	bdd4b25c61	scylla-gdb.py: scylla memory: remove 'sstable reads' from semaphore names This phrase is inaccurate and unnecessary. We know all lines in the printout are for reads and they are semaphores: no need to repeat this information on each line. Example: Read Concurrency Semaphores: read: 0/100, 0/ 41901096, queued: 0 streaming: 0/ 10, 0/ 41901096, queued: 0 system: 0/ 10, 0/ 41901096, queued: 0 Closes #12633	2023-01-25 21:55:27 +02:00
Nadav Har'El	f4f2d608d7	dbuild: fix path in example in README The dbuild README has an example how to enable ccache, and required modifying the PATH. Since recently, our docker image includes required commands (cxxbridge) in /usr/local/bin, so the build will fail if that directory isn't also in the path - so add it in the example. Also use the opportunity to fix the "/home/nyh" in one example to "$HOME". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12631	2023-01-25 21:54:44 +02:00
Pavel Emelyanov	9ccae1be18	test: Keep db::config as unique pointer The goal is to make it possible to make config with custom-initialized options in test_env::impl's constructor initializer list (next patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-01-25 19:38:47 +03:00
Kamil Braun	a0ff33e777	test/pylib: scylla_cluster: don't leak server if stopping it fails `ScyllaCluster.server_stop` had this piece of code: ``` server = self.running.pop(server_id) if gracefully: await server.stop_gracefully() else: await server.stop() self.stopped[server_id] = server ``` We observed `stop_gracefully()` failing due to a server hanging during shutdown. We then ended up in a state where neither `self.running` nor `self.stopped` had this server. Later, when releasing the cluster and its IPs, we would release that server's IP - but the server might have still been running (all servers in `self.running` are killed before releasing IPs, but this one wasn't in `self.running`). Fix this by popping the server from `self.running` only after `stop_gracefully`/`stop` finishes. Make an analogous fix in `server_start`: put `server` into `self.running` before we actually start it. If the start fails, the server will be considered "running" even though it isn't necessarily, but that is OK - if it isn't running, then trying to stop it later will simply do nothing; if it is actually running, we will kill it (which we should do) when clearing after the cluster; and we don't leak it. Closes #12613	2023-01-25 16:58:02 +02:00
Alejo Sanchez	878cb45c24	pytest: test start after ungraceful stop Test case for a start of a server after it was stopped suddenly (instead of gracefully). This coud cause commitlog flush issues. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-01-25 14:49:27 +01:00
Alejo Sanchez	ccbd89f0cd	test.py: enable force_schema_commit_log To handle start after ungraceful stop, enable separate schema commit log from server start. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-01-25 14:49:27 +01:00
Kamil Braun	5c886e59de	Merge 'Enable Raft by default in new clusters' from Kamil Braun New clusters that use a fresh conf/scylla.yaml will have `consistent_cluster_management: true`, which will enable Raft, unless the user explicitly turns it off before booting the cluster. People using existing yaml files will continue without Raft, unless consistent_cluster_management is explicitly requested during/after upgrade. Also update the docs: cluster creation and node addition procedures. Fixes #12572. Closes #12585 * github.com:scylladb/scylladb: docs: mention `consistent_cluster_management` for creating cluster and adding node procedures conf: enable `consistent_cluster_management` by default	2023-01-25 14:09:38 +01:00
Benny Halevy	82011fc489	dht: incremental_owned_ranges_checker: belongs_to_current_node: mark as const Its _it member keeps state about the current range. Although it's modified by the method, this is an implementation detail that irrelevant to the caller, hence mark the belongs_to_current_node method as const (and noexcept while at it). This allows the caller, cleanup_compaction, to use it from inside a const method, without having to mark its respective member as mutable too. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12634	2023-01-25 14:52:21 +02:00
Alexey Novikov	ce96b472d3	prevent populating cache with expired rows from sstables change row purge condition for compacting_reader to remove all expired rows to avoid read perfomance problems when there are many expired tombstones in row cache Refs #2252 Closes #12565	2023-01-25 12:59:40 +01:00
Kamil Braun	5bc7f0732e	Merge 'test.py: manual cluster pool handling for Python suite' from Alecco From reviews of https://github.com/scylladb/scylladb/pull/12569, avoid using `async with` and access the `Pool` of clusters with `get()`/`put()`. Closes #12612 * github.com:scylladb/scylladb: test.py: manual cluster handling for PythonSuite test.py: stop cluster if PythonSuite fails to start test.py: minor fix for failed PythonSuite test	2023-01-24 17:37:55 +01:00
Nadav Har'El	b28818db06	Merge 'Make regexes in types.cc static and remove unnecessary tolower transform' from Marcin Maliszkiewicz - makes all regexes static If making regex compilation static for uuid_type_impl and timeuuid_type_impl helps then it should also help for timestamp_type and simple_date_type. - remove unnecessary tolower transform in simple_date_type_impl::from_sstring Following function uses only decimal and '-' characters (see date_re). They are not affected by tolower call in any way. Aditionally std::strtoll supports "0x" prefixes but also accepts upper case version "0X" so it's also not affected by tolower call. get_simple_date_time only casts strings to integer types using boost:lexical_cast so also not affected by tolower. Finally, serialize only uses str to include it in an exception text so tolower doesn't affect it in a positive way. It's even better that input is displayed to the user as it was, not converted to lower case. Closes #12621 * github.com:scylladb/scylladb: types: remove unnecessary tolower transform in simple_date_type_impl::from_sstring types: make all regexes static	2023-01-24 16:13:59 +02:00
Pavel Emelyanov	f6e8b64334	snitch: Use set_my_dc_and_rack() on all shards Most of snitch drivers set _my_dc and _my_rack with direct assignment thus skipping the sanity checks for dc/rack being empty. On other shards they call set_my_dc_and_rack() helper which warns the empty value and replaces it with some defaults. It's better to use the helper on all shards in order to have the same dc/rack values everywhere. refs: #12185 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12524	2023-01-24 14:17:06 +02:00
Nadav Har'El	55558e1bd7	test/alternator: check operation on invalid TableName Issue #12538 suggested that maybe Alternator shouldn't bother reporting an invalid table name in item operations like PutItem, and that it's enough to report that the table doesn't exist. But the test added in this patch shows that DynamoDB, like Alternator, reports the invalid table name in this case - not just that the table doesn't exist. That should make us think twice before acting on issue #12538. If we do what this issue recommended, this test will need to be fixed (e.g., to accept as correct both types of errors). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12608	2023-01-24 14:14:39 +02:00
Kefu Chai	4a0134a097	db: system_keyspace: take the reserved_memory into account before this change, we returns the total memory managed by Seastar in the "total" field in system.memory. but this value only reflect the total memory managed by Seastar's allocator. if `reserve_additional_memory` is set when starting app_template, Seastar's memory subsystem just reserves a chunk of memory of this specified size for system, and takes the remaining memory. since `f05d612da8`, we set this value to 50MB for wasmtime runtime. hence the test of `TestRuntimeInfoTable.test_default_content` in dtest fails. the test expects the size passed via the option of `--memory` to be identical to the value reported by system.memory's "total" field. after this change, the "total" field takes the reserved memory for wasm udf into account. the "total" field should reflect the total size of memory used by Scylla, no matter how we use a certain portion of the allocated memory. Fixes #12522 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12573	2023-01-24 14:07:44 +02:00
Anna Stuchlik	3cbe657b24	doc: fixes https://github.com/scylladb/scylla-docs/issues/3706 , v2 of https://github.com/scylladb/scylladb/pull/11638 , add a note about performance penalty in non-frozen connections vs frozen connections and UDT, add a link to the blog post about performance Closes #12583	2023-01-24 13:16:58 +02:00
Nadav Har'El	158be3604d	test/alternator: xfailing test for huge Limit in ListStreams DynamoDB Streams limits the "Limit" parameter of ListStreams to 100 - anything larger will result in an error. Scylla doesn't necessarily need to uphold the same limit, but we should uphold some limit, as not having any limit can result (in the theoretical case of a huge number of tables with streams enabled) in an unbounded response size. So here we add a test to check that a Limit of 100,000 is not allowed. It passes on DynamoDB (in fact, any number higher than 100 will be enough threre) but fails on Alternator, so is marked "xfail". Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-01-24 12:38:18 +02:00
Nadav Har'El	3beafd8441	alternator/test: un-skip test of zero Limit in ListStreams We had a skipped test on how Alternator handles Limit=0 for ListStreams which should be reported as an error. We had to skip it because boto3 did us a "favor" of discovering this parameter error before ever sending it to the server. We discovered long ago how to avoid this client-side checking in boto3, but only used it for the "dynamodb" fixture and forgot to copy the same trick to the "dynamodbstreams" fixture - and in this patch we do, and can run this test successfully. While at it, also copy the extented timeout configuration we had in the dynamodb fixture also to the dynamodbstreams fixture. There is no reason why it should be different. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-01-24 12:38:18 +02:00
Alejo Sanchez	f236d518c6	test.py: manual cluster handling for PythonSuite Instead of complex async with logic, use manual cluster pool handling. Revert the discard() logic in Pool from a recent commit. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-01-24 11:38:17 +01:00
Alejo Sanchez	a6059e4bb7	test.py: stop cluster if PythonSuite fails to start If cluster fails to start, stop it. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-01-24 11:36:49 +01:00
Alejo Sanchez	dec0c1d9f6	test.py: minor fix for failed PythonSuite test Even though test can't fail both before and after, make the logic explicit in case code changes in the future. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-01-24 11:36:49 +01:00
Kefu Chai	232c73a077	doc: add PREVIEW_HOST Make variable add Make variable named `PREVIEW_HOST` so it can be overriden like ``` make preview PREVIEW_HOST=$(hostname -I \| cut -d' ' -f 1) ``` it allows developer to preview the document if the host buiding the document is not localhost. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12589	2023-01-24 12:27:33 +02:00
Botond Dénes	cfaec4428b	Merge 'Remove qctx from system_keyspace::increment_and_get_generation()' from Pavel Emelyanov It's a simple helper used during boot-time that can enjoy query-processor from sharded<system_keyspace> Closes #12587 * github.com:scylladb/scylladb: system_keyspace: De-static system_keyspace::increment_and_get_generation system_keyspace: Fix indentation after previous patch system_keyspace: Coroutinize system_keyspace::increment_and_get_generation	2023-01-24 12:17:12 +02:00
Marcin Maliszkiewicz	f4de64957b	types: remove unnecessary tolower transform in simple_date_type_impl::from_sstring Following function uses only decimal and '-' characters (see date_re). They are not affected by tolower call in any way. Aditionally std::strtoll supports "0x" prefixes but also accepts upper case version "0X" so it's also not affected by tolower call. get_simple_date_time only casts strings to integer types using boost:lexical_cast so also not affected by tolower. Finally, serialize only uses str to include it in an exception text so tolower doesn't affect it in a positive way. It's even better that input is displayed to the user as it was, not converted to lower case.	2023-01-24 10:50:13 +01:00
Calle Wilund	a079c3dbbe	alternator::streams: Special case single table in list_streams Avoid iterating all tables (at least multiple times).	2023-01-24 09:14:33 +00:00
Calle Wilund	9412d8f259	alternator::streams: Only sort tables iff limit < # tables or ExclusiveStartStreamArn set Avoid sorts for request that will be answered immediately.	2023-01-24 08:48:20 +00:00
Avi Kivity	49157370bc	build: don't force-disable io_uring in Seastar The reasons for force-disabling are doubly wrong: we now use liburing from Fedora 37, which is sufficiently recent, and the auto-detection code will disable io_uring if a sufficiently recent version isn't present. Closes #12620	2023-01-24 10:32:00 +02:00
Calle Wilund	9886788a46	alternator::streams: Set default list_streams limit to 100 as per spec AWS docs says so.	2023-01-24 08:24:42 +00:00
Kamil Braun	54170749b8	service/raft: raft_group0: prevent double abort There was a small chance that we called `timeout_src.request_abort()` twice in the `with_timeout` function, first by timeout and then by shutdown. `abort_source` fails on an assertion in this case. Fix this. Fixes: #12512 Closes #12514	2023-01-23 21:32:21 +01:00
Marcin Maliszkiewicz	76c1d0e5d3	types: make all regexes static If making regex compilation static for uuid_type_impl and timeuuid_type_impl helps then it should also help for timestamp_type and simple_date_type.	2023-01-23 20:37:32 +01:00
Nadav Har'El	634c3d81f5	Merge 'doc: add the general upgrade policy' from Anna Stuchlik Fix https://github.com/scylladb/scylla-docs/issues/3968 This PR adds the information that an upgrade to each successive major version is required to upgrade from an old ScyllaDB version. Closes #12586 * github.com:scylladb/scylladb: docs: remove repetition doc: add the general upgrade policy to the uprage page	2023-01-23 18:34:59 +02:00
Benny Halevy	008ca37d28	sstable_directory: reindent reshard Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-23 17:30:05 +02:00
Benny Halevy	792bc58fce	sstable_directory: coroutinize reshard Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-23 17:29:49 +02:00
Nadav Har'El	ccc2c6b5dd	Merge 'test/pylib: scylla_cluster: improve server startup check' from Kamil Braun Don't use a range scan, which is very inefficient, to perform a query for checking CQL availability. Improve logging when waiting for server startup times out. Provide details about the failure: whether we managed to obtain the Host ID of the server and whether we managed to establish a CQL connection. Closes #12588 * github.com:scylladb/scylladb: test/pylib: scylla_cluster: better logging for timeout on server startup test/pylib: scylla_cluster: use less expensive query to check for CQL availability	2023-01-23 17:00:52 +02:00
Kamil Braun	8a1ea6c49f	test/pylib: scylla_cluster: better logging for timeout on server startup Waiting for server startup is a multi-step procedure: after we start the actual process, we will: - try to obtain the Host ID (by querying a REST API endpoint) - then try to connect a CQL session - then try to perform a CQL query The steps are repeated every .1 second until we reach a timeout (the Host ID step is skipped if we previously managed to obtain it). On timeout we'd only get a generic "failed to start server" message, it wouldn't say what we managed to do and what not. For example, on one of the failed jobs on Jenkins I observed this timeout error. Looking at the logs of the server, it turned out that the server printed the "initialization completed" message more than 2 minutes before the actual timeout happened. So for 2 minutes, the test framework either couldn't obtain the Host ID, or couldn't establish a CQL connection, or couldn't perform a CQL query, but I wasn't able to determine fully which one of these was the case. Improve the code by printing whether we managed to get the Host ID of the server and if so - whether we managed to connect to CQL.	2023-01-23 15:59:42 +01:00
Kamil Braun	0e591606a5	test/pylib: scylla_cluster: use less expensive query to check for CQL availability The previous CQL query used a range scan which is very inefficient, even for local tables. Also add a comment explaining why we need this query.	2023-01-23 15:59:05 +01:00
Avi Kivity	3f887fa24b	Merge 'doc: remove duplicatiom of the ScyllaDB ports (table)' from Anna Stuchlik Fix https://github.com/scylladb/scylladb/issues/12605#event-8328930604 This PR removes the duplicated content (the file with the table was included twice) and reorganizes the content in the Networking section. Closes #12615 * github.com:scylladb/scylladb: doc: fix the broken link doc: replace Scylla with ScyllaDB doc: remove duplication in the Networking section (the table of ports used by ScyllaDB	2023-01-23 16:27:06 +02:00
Anna Stuchlik	30f3ee6138	doc: fix the broken link	2023-01-23 14:43:07 +01:00
Anna Stuchlik	1dd0fb8c2d	doc: replace Scylla with ScyllaDB	2023-01-23 14:40:36 +01:00
Anna Stuchlik	d881b3c498	doc: remove duplication in the Networking section (the table of ports used by ScyllaDB	2023-01-23 14:39:01 +01:00
Calle Wilund	da8adb4d26	alterator::streams: Sort tables in list_streams to ensure no duplicates Fixes #12601 (maybe?) Sort the set of tables on ID. This should ensure we never generate duplicates in a paged listing here. Can obviously miss things if they are added between paged calls and end up with a "smaller" UUID/ARN, but that is to be expected.	2023-01-23 11:41:40 +00:00
Benny Halevy	1123565eb0	table: perform_cleanup_compaction: trim owned ranges on compaction_group boundaries To cleanup tokens in sstables that are not owned by the compaction group. This may happen in the future after a compaction group split if copying / linking the sstables in the original compaction_group to the split compaction_groups. Fixes #12594 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-22 22:54:26 +02:00
Benny Halevy	95a8e0b21d	table: make_compaction_groups: calculate compaction_group token ranges Add dht::split_token_range_msb that returns a token_range_vector with ranges split using a given number of most-significant bits. When creating the table's compaction groups, use dht::split_token_range_msb to calculate the token_range owned by each compaction_group. Refs #12594 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-22 22:54:26 +02:00
Benny Halevy	912b56ebcf	dht: range_streamer: define logger as static dht::logger can't be global in this case, as it's too generic, but should be static to range_streamer. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-22 22:54:26 +02:00
Nadav Har'El	54f174a1f4	Merge 'test.py: handle broken clusters for Python suite' from Alecco If the after test check fails (is_after_test_ok is False), discard the cluster and raise exception so context manager (pool) does not recycle it. Ignore exception re-raised by the context manager. Fixes #12360 Closes #12569 * github.com:scylladb/scylladb: test.py: handle broken clusters for Python suite test.py: Pool discard method	2023-01-22 19:58:12 +02:00
Benny Halevy	8009585e7d	table: compaction_group_for_token: use signed arithmetic Add and use dht::compaction_group_of that computes the compaction_group index by unbiasing the token, similar to dht::shard_of. This way, all tokens in `_compaction_groups[i]` are ordered before `_compaction_groups[j]` iff i < j. Fixes #12595 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12599	2023-01-22 11:27:07 +02:00
Pavel Emelyanov	be2ad2fe99	system_keyspace: De-static system_keyspace::increment_and_get_generation It's only called on cluster-join from storage_service which has the local system_keyspace reference and it's already started by that time. This allows removing few more occurrences of global qctx. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-01-20 17:24:22 +03:00
Pavel Emelyanov	4c4f8aa3e1	system_keyspace: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-01-20 17:24:22 +03:00
Pavel Emelyanov	b0edc07339	system_keyspace: Coroutinize system_keyspace::increment_and_get_generation Just unroll the fn().then({ fn2().then().then(); }); chain. Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-01-20 17:24:10 +03:00
Botond Dénes	ebc100f74f	types: is_tuple(): handle reverse types Currently reverse types match the default case (false), even though they might be wrapping a tuple type. One user-visible effect of this is that a schema, which has a reversed<frozen<UDT>> clustering key component, will have this component incorrectly represented in the schema cql dump: the UDT will loose the frozen attribute. When attempting to recreate this schema based on the dump, it will fail as the only frozen UDTs are allowed in primary key components. Fixes: #12576 Closes #12579	2023-01-20 15:50:58 +02:00
Anna Stuchlik	0a91578875	docs: remove repetition	2023-01-20 14:45:59 +01:00
Anna Stuchlik	2c357a7007	doc: add the general upgrade policy to the uprage page	2023-01-20 14:43:26 +01:00
Botond Dénes	7f9b39009c	reader_concurrency_semaphore_test: leak test: relax iteration limit This test creates random dummy reads and simulates a query with them. The test works in terms of iteration (tick), advancing each simulating read in each iteration. To prevent infinite runtime an iteration limit of 100 was added to detect a non-converging test and kill it. This limit proved too strict however and in this patch we bump it to 1000 to prevent some unlucky seed making this test fail, as seen recently in CI. Closes #12580	2023-01-20 15:39:13 +02:00
Kamil Braun	050614f34d	docs: mention `consistent_cluster_management` for creating cluster and adding node procedures	2023-01-20 13:29:25 +01:00
Kamil Braun	b0313e670b	conf: enable `consistent_cluster_management` by default Raft will be turned on by default in new clusters. Fixes #12572	2023-01-20 13:29:06 +01:00
Botond Dénes	0d64f327e1	Merge 'gdb: Introduce 'scylla range-tombstones' command' from Tomasz Grabiec Prints and validates range tombstones in a given container. Currently supported containers: - mutation_partition Example: ``` (gdb) scylla range-tombstones $mp { start: ['a', 'b'], kind: bound_kind::excl_start, end: ['a', 'b'], kind: bound_kind::incl_end, t: {timestamp = 1672546889091665, deletion_time = {__d = {__r = 1672546889}}} } { start: ['a', 'b'], kind: bound_kind::excl_start, end: ['a', 'c'] kind: bound_kind::incl_end, t: {timestamp = 1673731764010123, deletion_time = {__d = {__r = 1673731764}}} } ``` Closes #12571 * github.com:scylladb/scylladb: gdb: Introduce 'scylla range-tombstones' gdb: Introduce 'scylla set-schema' gdb: Extract purse_bytes() in managed_bytes_printer	2023-01-20 11:21:34 +02:00
Nadav Har'El	3d78dbd9f2	test/cql-pytest: regression tests for null lookup in local SI We noticed that old branches of Scylla had problems with looking up a null value in a local secondary index - hanging or crashing. This patch includes tests to reproduce these bugs. The tests pass on current master - apparently this bug has already been fixed, but we didn't have a regression test for it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12570	2023-01-19 23:58:33 +02:00
Alejo Sanchez	51e84508ee	test.py: handle broken clusters for Python suite If the after test check fails (!is_after_test_ok), discard the cluster and raise exception so context manager (pool) does not recycle it. Ignore Pool exception re-raised by the context manager. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-01-19 21:43:50 +01:00
Alejo Sanchez	c886a05b37	test.py: Pool discard method Add a context manager discard() method to tell it to discard the object. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-01-19 21:43:45 +01:00
Avi Kivity	b4d91d87db	Merge 'build: fix build problems in Nix development environment' from Piotr Grabowski This PR fixes three problems that prevented/could prevent a successful build in ScyllaDB's Nix development environment. The first commit adds a missing `abseil-cpp` dependency to Nix devenv, as this dependency is now required after `8635d2442`. The second commit bumps the version of Lua from 5.3 to 5.4, as after `9dd5107919` a 4-argument version of `lua_resume` (only available in Lua 5.4) is used in the ScyllaDB codebase. The third commit explicitly adds `rustc` to Nix devenv dependencies. This places `rustc` from nixpkgs on the `PATH`, preventing `cargo` from executing `rustc` installed globally on the system (see the commit message for additional reasoning). After those changes, ScyllaDB can be succesfully built in both `nix-shell .` and `nix develop .` environments. Closes #12568 * github.com:scylladb/scylladb: build: explicitly add rustc to Nix devenv build: bump Lua version (5.3 -> 5.4) in Nix devenv build: add abseil-cpp dependency to Nix devenv	2023-01-19 21:52:37 +02:00
Tomasz Grabiec	95547162c0	gdb: Introduce 'scylla range-tombstones' Prints and validates range tombstones in a given container. Currently supported containers: - mutation_partition Example: (gdb) scylla range-tombstones $mp { start: ['a', 'b'], kind: bound_kind::excl_start, end: ['a', 'b'], kind: bound_kind::incl_end, t: {timestamp = 1672546889091665, deletion_time = {__d = {__r = 1672546889}}} } { start: ['a', 'b'], kind: bound_kind::excl_start, end: ['a', 'c'] kind: bound_kind::incl_end, t: {timestamp = 1673731764010123, deletion_time = {__d = {__r = 1673731764}}} }	2023-01-19 19:58:13 +01:00
Tomasz Grabiec	f759b35596	gdb: Introduce 'scylla set-schema' Sets the current schema to be used by schema-aware commands. Setting the schema allows some commands and printers to interpret schema-dependent objects and present them in a more friendly form. Some commands require schema to work, for example to sort keys, and will fail otherwise.	2023-01-19 19:58:13 +01:00
Tomasz Grabiec	797bc7915d	gdb: Extract purse_bytes() in managed_bytes_printer	2023-01-19 19:58:13 +01:00
Kamil Braun	2f84e820fd	test/pylib: scylla_cluster: return error details from test framework endpoints If an endpoint handler throws an exception, the details of the exception are not returned to the client. Normally this is desirable so that information is not leaked, but in this test framework we do want to return the details to the client so it can log a useful error message. Do it by wrapping every handler into a catch clause that returns the exception message. Also modify a bit how HTTPErrors are rendered so it's easier to discern the actual body of the error from other details (such as the params used to make the request etc.) Before: ``` E test.pylib.rest_client.HTTPError: HTTP error 500: 500 Internal Server Error E E Server got itself in trouble, params None, json None, uri http+unix://api/cluster/before-test/test_stuff ``` After: ``` E test.pylib.rest_client.HTTPError: HTTP error 500, uri: http+unix://api/cluster/before-test/test_stuff, params: None, json: None, body: E Failed to start server at host 127.155.129.1. E Check the log files: E /home/kbraun/dev/scylladb/testlog/test.py.dev.log E /home/kbraun/dev/scylladb/testlog/dev/scylla-1.log ``` Closes #12563	2023-01-19 17:47:13 +02:00
Kamil Braun	3ed3966f13	test/pylib: scylla_cluster: release cluster IPs when stopping ScyllaClusterManager When we obtained a new cluster for a test case after the previous test case left a dirty cluster, we would release the old cluster's used IP addresses (`_before_test` function). However, we would not release the last cluster's IP after the last test case. We would run out of IPs with sufficiently many test files or `--repeat` runs. Fix this. Also reorder the operations a bit: stop the cluster (and release its IPs) before freeing up space in the cluster pool (i.e. call `self.cluster.stop()` before `self.clusters.steal()`). This reduces concurrency a bit - fewer Scyllas running at the same time, which is good (the pool size gives a limit on the desired max number of concurrently running clusters). Killing a cluster is quick so it won't make a significant difference for the next guy waiting on the pool. Closes #12564	2023-01-19 17:46:46 +02:00
Piotr Grabowski	4068efa173	build: explicitly add rustc to Nix devenv Before this patch, "cargo" was the only Rust toolchain dependency in Nix development environment. Due to the way "cargo" tool is packaged in Nix, "cargo" would first try to use "rustc" from PATH (for example some version already installed globally on OS). If it didn't find any, it would fallback to "rustc" from nixpkgs. There are issues with such approach: - "rustc" installed globally on the system could be old. - the goal of having a Nix development environment is that such environment is separate from the programs installed globally on the system and the versions of all tools are pinned (via flake.lock). Fix this problem by adding rustc to nativeBuildInputs in default.nix. After this patch, "rustc" from nixpkgs is present on the PATH (potentially overriding "rustc" already installed on the system), so "cargo" can correctly use it. You can validate this behavior experimentally by adding a fake failing rustc before entering the Nix development environment: mkdir fakerustc echo '#!/bin/bash' >> fakerustc/rustc echo 'exit 1' >> fakerustc/rustc chmod +x fakerustc/rustc export PATH=$(pwd)/fakerustc:$PATH nix-shell .	2023-01-19 15:53:49 +01:00
Piotr Grabowski	1b8a6b160e	build: bump Lua version (5.3 -> 5.4) in Nix devenv A recent commit (`9dd5107919`) started using a 4-argument version of lua_resume, which is only available in Lua 5.4. This caused build problems when trying to build Scylla in Nix development environment: tools/lua_sstable_consumer.cc:1292:19: error: no matching function for call to 'lua_resume' ret = lua_resume(l, nullptr, nargs, &nresults); ^~~~~~~~~~ /nix/store/wiz3xb19x2pv7j3hf29rbafm4s5zp2kx-lua-5.3.6/include/lua.h:290:15: note: candidate function not viable: requires 3 arguments, but 4 were provided LUA_API int (lua_resume) (lua_State L, lua_State from, int narg); ^ 1 error generated. Fix the problem by bumping the version of Lua from 5.3 to 5.4 in default.nix. Since "lua54Packages.lua" was added to nixpkgs fairly recently (NixOS/nixpkgs#207862), flake.lock is updated to get the newest version of nixpkgs (updated using "nix flake update" command).	2023-01-19 15:53:49 +01:00
Marcin Maliszkiewicz	7230841431	alternator: unify json streaming heuristic Main assumption here is that if is_big is good enough for GetBatchItems operation it should work well also for Scan, Query and GetRecords. And it's easier to maintain more unified code. Additionally 'future<> print' documentation used for streaming suggests that there is quite big overhead so since it seems the only motivation for streaming was to reduce contiguous allocation size below some threshold we should not stream when this threshold is not exceeded. Closes #12164	2023-01-19 16:40:43 +02:00
Anna Stuchlik	20f7848661	docs: add a missing redirection for the Cqlsh page This PR is not related to any reported issue in the repo. I've just discovered a broken link in the university caused by a missing redirection. Closes #12567	2023-01-19 16:37:58 +02:00
Piotr Grabowski	fbc042ff02	build: add abseil-cpp dependency to Nix devenv After `8635d2442` commit, the abseil submodule was removed in favor of using pre-built abseil distribution. Installation of abseil-cpp was added to install-dependencies.sh and dbuild image, but no change was made to the Nix development environment, which resulted in error while executing ./configure.py (while in Nix devenv): Package absl_raw_hash_set was not found in the pkg-config search path. Perhaps you should add the directory containing `absl_raw_hash_set.pc' to the PKG_CONFIG_PATH environment variable No package 'absl_raw_hash_set' found Fix the issue by adding "abseil-cpp" to buildInputs in default.nix.	2023-01-19 15:03:55 +01:00
Nadav Har'El	18be50582d	test/cql-pytest: add tests for behavior of unset values Recently, commit `0b418fa` made the checking for "unset" values more centralized and more robust, but as the tests added in this patch show, the situation is good (and in particular, that #10358 is solved). The tests in this patch check that the behavior of "unset" values in the CQL v4 protocol matches Cassandra's behavior and its documentation, and how it compares to our wishes of how we want unset values to behave. One of these tests fail on Cassandra (we consider this a Cassandra bug). One test fails on Scylla because it doesn't yet support arithmetic expressions (Refs #2693). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12534	2023-01-19 15:48:07 +02:00
Nadav Har'El	9433108158	Merge 'Allow transient list values to contain NULLs' from Avi Kivity The CQL protocol and specification call for lists with NULLs in some places. For example, the statement: ```cql UPDATE tab SET x = 3 IF y IN (1, 2, NULL) WHERE pk = 4 ``` has a list `(1, 2, NULL)` that contains NULL. Although the syntax is tuple-like, the value is a list; consider the same statement as a prepared statement: ```cql UPDATE tab SET x = :x IF y IN :y_values WHERE pk = :pk ``` `:y_values` must have a list type, since the number of elements is unknown. Currently, this is done with special paths inside LWT that bypass normal evaluation, but if we want to unify those paths, we must allow NULLs in lists (except in storage). This series does that. Closes #12411 * github.com:scylladb/scylladb: test: materialized view: add test exercising synthetic empty-type columns cql3: expr: relax evaluate_list() to allow allow NULL elements types: allow lists with NULL test: relax NULL check test predicate cql3, types: validate listlike collections (sets, lists) for storage types: make empty type deserialize to non-null value	2023-01-19 15:15:16 +02:00
Botond Dénes	d661d03057	Merge 'main, test: integrate perf tools into scylla' from Kefu Chai following tests are integrated into scylla executable - perf_fast_forward - perf_row_cache_update - perf_simple_query - perf_row_cache_update - perf_sstable before this change ```console $ size build/release/scylla text data bss dec hex filename 82284664 288960 335897 82909521 4f11951 build/release/scylla $ ls -l build/release/scylla -rwxrwxr-x 1 kefu kefu 1719672112 Jan 19 17:51 build/release/scylla ``` after this change ```console $ size build/release/scylla text data bss dec hex filename 84349449 289424 345257 84984130 510c142 build/release/scylla $ ls -l build/release/scylla -rwxrwxr-x 1 kefu kefu 1774204800 Jan 19 17:52 build/release/scylla ``` Fixes #12484 Closes #12558 * github.com:scylladb/scylladb: main: move perf_sstable into scylla main: move perf_row_cache_update into scylla test: perf_row_cache_update: add static specifier to local functions main: move perf_fast_forward into scylla main: move perf_simple_query into scylla test: extract debug::the_database out main: shift the args when checking exec_name main: extract lookup_main_func() out	2023-01-19 15:01:30 +02:00
Kamil Braun	147dd73996	test/pylib: scylla_cluster: mark cluster as dirty if it fails to boot If a cluster fails to boot, it saves the exception in `self.start_exception` variable; the exception will be rethrown when a test tries to start using this cluster. As explained in `before_test`: ``` def before_test(self, name) -> None: """Check that the cluster is ready for a test. If there was a start error, throw it here - the server is running when it's added to the pool, which can't be attributed to any specific test, throwing it here would stop a specific test.""" ``` It's arguable whether we should blame some random test for a failure that it didn't cause, but nevertheless, there's a problem here: the `start_exception` will be rethrown and the test will fail, but then the cluster will be simply returned to the pool and the next test will attempt to use it... and so on. Prevent this by marking the cluster as dirty the first time we rethrow the exception. Closes #12560	2023-01-19 14:26:57 +02:00
Marcin Maliszkiewicz	4c33791f96	alternator: eliminate regexes from the hot path This decreases the whole alternator::get_table cpu time by 78% (from 2.8 us to 0.6 us on my cpu). In perf_simple_query it decreases allocs/op by 1.6% (by removing 4 allocations) and increases median tps by 3.4%. Raw results from running: ./build/release/test/perf/perf_simple_query_g --smp 1 \ --alternator forbid --default-log-level error \ --random-seed=1235000092 --duration=180 --write Before the patch: median 46903.65 tps (197.2 allocs/op, 12.1 tasks/op, 170886 insns/op, 0 errors) median absolute deviation: 210.15 maximum: 47354.59 minimum: 42535.63 After the patch: median 48484.76 tps (194.1 allocs/op, 12.1 tasks/op, 168512 insns/op, 0 errors) median absolute deviation: 317.32 maximum: 49247.69 minimum: 44656.38 Closes #12445	2023-01-19 13:23:24 +02:00
Avi Kivity	9029b8dead	test: disable commitlog O_DSYNC, preallocation Commitlog O_DSYNC is intended to make Raft and schema writes durable in the face of power loss. To make O_DSYNC performant, we preallocate the commitlog segments, so that the commitlog writes only change file data and not file metadata (which would require the filesystem to commit its own log). However, in tests, this causes each ScyllaDB instance to write 384MB of commitlog segments. This overloads the disks and slows everything down. Fix this by disabling O_DSYNC (and therefore preallocation) during the tests. They can't survive power loss, and run with --unsafe-bypass-fsync anyway. Closes #12542	2023-01-19 11:14:05 +01:00
Kefu Chai	7f5bb19d1f	main: move perf_sstable into scylla * configure.py: - include `test/perf/perf_sstable` and its dependencies in scylla_perfs * test/perf/perf_sstable.cc: change `main()` to `perf::scylla_sstable_main()` * test/perf/entry_point.hh: add `perf::scylla_sstable_main()` * main.cc: - dispatch "perf-sstable" subcommand to `perf::scylla_sstable_main` before this change, we have a tool at `test/perf/perf_sstable` for running performance tests by exercising sstable related operations. after this change, the `test/perf/perf_sstable` is integreated into `scylla` as a subcommand. so we can run `scylla perf-sstable` [options, ...]` to perform the same tests previous driven by the tool. Fixes #12484 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-01-19 17:42:52 +08:00
Kefu Chai	240f2c6f00	main: move perf_row_cache_update into scylla * configure.py: - include `test/perf/perf_row_cache_update.cc` in scylla_perfs * main.cc: - dispatch "perf-row-cache-update" subcommand to `perf::scylla_row_cache_update_main` * test/perf/perf_fast_forward.cc: change `main()` to `perf::scylla_row_cache_update_main()` * test/perf/entry_point.hh: add `perf::scylla_row_cache_update_main()` before this change, we have a tool at `test/perf/perf_row_cache_update` for running performance tests by updating row cache. after this change, the `test/perf/perf_row_cache_update` is integreated into `scylla` as a subcommand. so we can run `scylla perf-row-cache-update [options, ...]` to perform the same tests previous driven by the tool. Fixes #12484 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-01-19 17:42:46 +08:00
Kefu Chai	4e390b9a05	test: perf_row_cache_update: add static specifier to local functions now that these functions are only used by the same compiling unit, they don't need external linkage. so let's hide them using `static`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-01-19 17:42:46 +08:00
Kefu Chai	228ccdc1c7	main: move perf_fast_forward into scylla * configure.py: - include `test/perf/perf_simple_query.cc` in scylla_perfs * main.cc: - dispatch "perf-fast-forward" subcommand to `perf::scylla_fast_forward_main` * test/perf/perf_fast_forward.cc: change `main()` to `perf::scylla_simple_query_main()` * test/perf/entry_point.hh: add `perf::scylla_simple_query_main()` before this change, we have a tool at `test/perf/perf_fast_forward` for running performance tests by fast forwarding the reader. after this change, the `test/perf/perf_fast_forward` is integreated into `scylla` as a subcommand. so we can run `scylla perf-fast-forward [options, ...]` to perform the same tests previous driven by the tool. Fixes #12484 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-01-19 17:42:40 +08:00
Kefu Chai	09de031cab	main: move perf_simple_query into scylla * configure.py: - include scylla_perfs in scylla - move 'test/lib/debug.cc' down scylla_perfs, as the latter uses `debug::the_database` - link `scylla` against seastar_testing_libs also. because we use the helpers in `test/lib/random_utils.hh` for generating random numbers / sequences in `perf_simple_query.cc`, and `random_utils.hh` references `seastar::testing::local_random_engine` as a local RNG. but `seastar::testing::local_random_engine` is included in `libseastar_testing.a` or `libseastar_perf_testing.a`. since we already have the rules for linking against `libseastar_testing.a`, let's just reuse them, and link `scylla` against this new dependency. * main.cc: - dispatch "perf-simple-query" subcommand to `perf::scylla_simple_query_main` * test/perf/perf_simple_query.cc: change `main()` to `perf::scylla_simple_query_main()` * test/perf/entry_point.hh: define the main function entries so `main.cc` can find them. it's quite like how we collect the entries in `tools/entry_point.hh` before this change, we have a tool at `test/perf/perf_simple_query` for running performance test by sending simple query to a single-node cluster. after this change, the `test/perf/perf_simple_query` is integreated into `scylla` as a subcommand. so we can run `scylla perf-simple-query [options, ...]` to perform the same tests previous driven by the tool. Fixes #12484 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-01-19 17:42:30 +08:00
Kefu Chai	c65692a13a	test: extract debug::the_database out we want to integrate some perf test into scylla executable, so we can run them on a regular basis. but `test/lib/cql_test_env.cc` shares `debug::the_database` with `main.cc`, so we cannot just compile them into a single binary without changing them. before this change, both `test/lib/cql_test_env.cc` and `main.cc` define `debug::the_database`. after this change, `debug::the_database` is extracted into `debug.cc`, so it compiles into a separate compiling unit. and scylla and tests using seastar testing framework are linked against `debug.cc` via `scylla_core` respectively. this paves the road to integrating scylla with the tests linking aginst `test/lib/cql_test_env.cc`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-01-19 17:42:23 +08:00
Nadav Har'El	0ff0c80496	test/cql-pytest: un-xfail tests for UNSET values Commit `0b418fa` improved the error detection of unset values in inappropriate CQL statements, and some of the unit tests translated from Cassandra started to pass, so this patch removes their "xfail" mark. In a couple of places Scylla's error message is worded differently from Cassandra, so the test was modified to look for a shorter string common to both implementations. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12553	2023-01-19 07:47:08 +02:00
Kefu Chai	6a3b19b53d	test/perf: replace "std::cout <<" with fmt::print() for better readablity Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12559	2023-01-19 07:45:13 +02:00
Avi Kivity	aab5954cfb	Merge 'reader_concurrency_semaphore: add more layers of defense against OOM' from Botond Dénes The reader concurrency semaphore has no mechanism to limit the memory consumption of already admitted read. Once memory collective memory consumption of all the admitted reads is above the limit, all it can do is to not admit any more. Sometimes this is not enough and the memory consumption of the already admitted reads balloons to the point of OOMing the node. This pull-request offers a solution to this: it introduces two more layers of defense above this: a soft and a hard limit. Both are multipliers applied on the semaphores normal memory limit. When the soft limit threshold is surpassed, all readers but one are blocked via a new blocking `request_memory()` call which is used by the `tracking_file_impl`. The reader to be allowed to proceed is chosen at random, it is the first reader which happens to request memory after the limit is surpassed. This is both very simple and should avoid situations where the algorithm choosing the reader to be allowed to proceed chooses a reader which will then always time out. When the hard limit threshold is surpassed, `reader_concurrency_semaphore::consume()` starts throwing `std::bad_alloc`. This again will result in eliminating whichever reader was unlucky enough to request memory at the right moment. With this, the semaphore is now effectively enforcing an upper bound for memory consumption, defined by the hard limit. Refs: https://github.com/scylladb/scylladb/issues/11927 Closes #11955 * github.com:scylladb/scylladb: test: reader_concurrency_semaphore_test: add tests for semaphore memory limits reader_permit: expose operator<<(reader_permit::state) reader_permit: add id() accessor reader_concurrency_semaphore: add foreach_permit() reader_concurrency_semaphore: document the new memory limits reader_concurrency_semaphore: add OOM killer reader_concurrency_semaphore: make consume() and signal() private test: stop using reader_concurrency_semaphore::{consume,signal}() directly reader_concurrency_semaphore: move consume() out-of-line reader_permit: consume(): make it exception-safe reader_permit: resource_units::reset(): only call consume() if needed reader_concurrency_semaphore: tracked_file_impl: use request_memory() reader_concurrency_semaphore: add request_memory() reader_concurrency_semaphore: wrap wait list reader_concurrency_semaphore: add {serialize,kill}_limit_multiplier parameters test/boost/reader_concurrency_semaphore_test: dummy_file_impl: don't use hardoced buffer size reader_permit: add make_new_tracked_temporary_buffer() reader_permit: add get_state() accessor reader_permit: resource_units: add constructor for already consumed res reader_permit: resource_units: remove noexcept qualifier from constructor db/config: introduce reader_concurrency_semaphore_{serialize,kill}_limit_multiplier scylla-gdb.py: scylla-memory: extract semaphore stats formatting code scylla-gdb.py: fix spelling of "graphviz"	2023-01-18 17:02:55 +02:00
Avi Kivity	9a54cb5deb	Merge 'cql3/expr: make it possible to prepare binary_operator' from Jan Ciołek `prepare_expression` takes an unprepared CQL expression straight from the parser output and prepares it. Preparation consists of various type checks that are needed to ensure that the expression is correct and to reason about it. While `prepare_expression` supports a number of different types of expressions, until now it was impossible to prepare a `binary_operator`. Eventually we would like to be able to prepare all kinds of expressions, so this PR adds the missing support for `binary_operator`. Closes #12550 * github.com:scylladb/scylladb: expr_test: test preparing binary_operator with NULL RHS expr_test: test preparing IS NOT NULL binary_operator expr_test: test preparing binary_operator with LIKE expr_test: test preparing binary_operator with CONTAINS KEY expr_test: test preparing binary_operator with CONTAINS expr_test: test preparing binary_operator with IN expr_test: test preparing binary_operator with =, !=, <, <=, >, >= expr_test: use make__untyped function in existing tests expr_test_utils: add utilities to create untyped_constant expr_test_utils: add make_float_ and make_double_* cql3: expr: make it possible to prepare binary_operator using prepare_expression cql3/expr: check that RHS of IS NOT NULL is a null value when preparing binary operators cql3: expr: pass non-empty keyspace name in prepare_binary_operator cql3: expr: take reference to schema in prepare_binary_operator	2023-01-18 16:55:18 +02:00
Jenkins Promoter	75a3dd2fc8	release: prepare for 5.3.0-dev	2023-01-18 16:22:41 +02:00
Kefu Chai	965443d6be	main: shift the args when checking exec_name instead of introducing yet another variable for tracking the status, update the args right away. for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-01-18 22:22:10 +08:00
Kefu Chai	835cd9bfc9	main: extract lookup_main_func() out refactor main() to extract lookup_main_func() out, so we find the main_func in a table instead of using a lengthy if-then-else clause. when the length of the list of candidates of dispatch grows, the code would be less structured. so in this change, the code looking up for the main_func is extracted into a dedicated function for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-01-18 22:22:10 +08:00
Avi Kivity	71bbd7475c	Update seastar submodule * seastar 8889cbc198...d41af8b592 (14): > Merge 'Perf stall detector related improvements' from Travis Downs Ref #8828, #7882, #11582 (may help make progress) > build: pass HEAPPROF definition to src/core/reactor.cc too > Limit memory address space per core to 64GB when hwloc is not available > build: revert use pkg_search_module(.. IMPORTED_TARGET ..) changes > Fix missing newlines in seastar-addr2line > Use an integral type for uniform_int_distribution > Merge 'tls_test: use a dedicated https server for testing' from Kefu Chai > build: use ${CMAKE_BINARY_DIR} when running 'cmake --build ..' > build: do not set c-ares_FOUND with PARENT_SCOPE > reactor: drop unused member function declaration > sstring: refactor to_sstring() using fmt::format_to() > http: delay input stream close until responses sent > build: enable non-library targets using default option value > Merge 'sstring: specialize uninitialize_string() and use resize_and_overwrite if available' from Kefu Chai Closes #12509	2023-01-18 15:50:57 +02:00
Jan Ciolek	ae0e955b90	expr_test: test preparing binary_operator with NULL RHS Make sure that preparing binary_operator works properly when the RHS is NULL. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:46 +01:00
Jan Ciolek	65b8a09409	expr_test: test preparing IS NOT NULL binary_operator Add unit test which check that preparing binary_operators which represent IS NOT NULL works as expected Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:46 +01:00
Jan Ciolek	5b3e6769f1	expr_test: test preparing binary_operator with LIKE Add unit test which check that preparing binary_operators with the LIKE operation works as expected. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com	2023-01-18 12:04:45 +01:00
Jan Ciolek	e876496f7f	expr_test: test preparing binary_operator with CONTAINS KEY Add unit test which check that preparing binary_operators with the CONTAINS KEY operation works as expected. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:45 +01:00
Jan Ciolek	c6d2e1a03e	expr_test: test preparing binary_operator with CONTAINS Add unit test which check that preparing binary_operators with the CONTAINS operation works as expected. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:45 +01:00
Jan Ciolek	6b147ecaea	expr_test: test preparing binary_operator with IN Add unit test which check that preparing binary_operators with the IN operation works as expected. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:45 +01:00
Jan Ciolek	669d791250	expr_test: test preparing binary_operator with =, !=, <, <=, >, >= Add unit test which check that preparing binary_operators with basic comparison operations works as expected. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:44 +01:00
Jan Ciolek	60803d12a9	expr_test: use make_*_untyped function in existing tests Use the newly introduced convenience methods that create untyped_constant in existing tests. This will make the code more readable by removing visual clutter that came with the previous overly verbose code. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:44 +01:00
Jan Ciolek	819390f9fe	expr_test_utils: add utilities to create untyped_constant expression tests often need to create instances of untyped_constant. Creating them by hand is tedious because the required code is overly verbose. Having convenience functions for it speeds up test writing. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:44 +01:00
Jan Ciolek	362bf7f534	expr_test_utils: add make_float_* and make_double_* Add utilities to create float and double values in tests. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:44 +01:00
Jan Ciolek	da3c07955a	cql3: expr: make it possible to prepare binary_operator using prepare_expression prepare_expression didn't allow to prepare binary_operators. so it's now implemented. If prepare_binary_operator is unable to infer the types it will fail with an exception instead of returning std::nullopt, but we can live with that for now. Preparing binary_operators inside the WHERE clause is currently more complicated than just calling prepare_binary_operator. Preparation of the WHERE clause is done inside statement_restrictions constructor. It's done by iterating over all binary_operators, validating them and then preparing. The validation contains additional checks with custom error messages. Preparation has to be done after validation, because otherwise the error messages will change and some tests will start failing. Because of that we can't just call prepare_expression on the WHERE clause yet. It's still useful to have the ability to prepare binary_operators using prepare_expression. In cases where we know that the WHERE clause is valid, we can just call prepare_expression and be done with it. Once grammar is fully relaxed the artificial constraints checked by the validation code will be removed and it will be possible to prepare the whole WHERE clause using just prepare_expression. prepare_expression does a bit more than prepare_binary_operator. In case where both sides of the binary_operator are known it will evaluate the whole binary_operator to a constant value. Query analysis code is NOT ready to encounter constant boolean values inside the WHERE clause, so for the WHERE we still use prepare_binary_operator which doesn't evaluate the binary_operator to a constant value. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:43 +01:00
Jan Ciolek	5f8b1a1a60	cql3/expr: check that RHS of IS NOT NULL is a null value when preparing binary operators When preparing a binary operator we first prepare the LHS, which gives us information about its type and allows to infer the desired type of RHS. Then the RHS is prepared with the expectation that it is compatible with the inferred type. This is enough for all types of operations apart from IS NOT NULL. For IS NOT we should also check that the RHS value is actually null. It's not enough to check that RHS is of right type. Before this change preparing `int_col IS NOT 123` would end in success, which is wrong. The missing check doesn't cause any real problems, it's impossible for the user to produce such input because the parser will reject it. Still it's better to have the check because in the future the grammar might get more relaxed and the parser could become more generic, making it possible to write such things. It would be better to introduce unary_operators, but that's a bigger change. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:43 +01:00
Jan Ciolek	703e9f21ff	cql3: expr: pass non-empty keyspace name in prepare_binary_operator For some reason we passed an empty keyspace name to prepare_expression when preparing the LHS of a binary operator. This doesn't look correct. We have keyspace name available from the schema_ptr so let's use that. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:43 +01:00
Jan Ciolek	9a0c5789a2	cql3: expr: take reference to schema in prepare_binary_operator prepare_binary_operator takes a schema_ptr, but it would be useful to take a reference to schema instead. Every schema_ptr can be easily converted to a reference so there is no loss of functionality. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:40 +01:00
Nadav Har'El	48e2d6a541	Merge 'utils: throw error on malformed input in base64 decode' from Marcin Maliszkiewicz Several cases where fixed in this patches, all are related to processing of malformed base64 data. Main purpose was to bring alternator implementation closer to what DynamoDB does. We now: - Throw error when padding is missing during base64 decoding - Throw error when base64 data is malformed - In alternator when invalid base64 data is fetched from DB (as opposed to being part of user's request) we now exclude such row during filtering Additionally some small code quality improvements: - avoid unnecessary type conversions in calls to rjson:from_strings functions - avoid some copy constructions in calls to rjson:from_strings functions Fixes https://github.com/scylladb/scylladb/issues/6487 Closes #11944 * github.com:scylladb/scylladb: alternator: evaluate expressions as false for stored malformed binary data rjson: avoid copy constructors in from_string calls when possible alternator: remove unused parameters from describe_items func utils: throw error on malformed input in base64 decode utils: throw error on missing padding in base64 decode	2023-01-18 12:40:57 +02:00
Avi Kivity	561f4ca057	test: materialized view: add test exercising synthetic empty-type columns Materialized views inject synthetic empty-type columns in some conditions. Since we just touched empty-type serialization/deserialization, add a test to exercise it and make sure it still works.	2023-01-18 10:38:24 +02:00
Avi Kivity	04925a7b29	cql3: expr: relax evaluate_list() to allow allow NULL elements Tests are similarly relaxed. A test is added in lwt_test to show that insertion of a list with NULL is still rejected, though we allow NULLs in IF conditions. One test is changed from a list of longs to a list of ints, to prevent churn in the test helper library.	2023-01-18 10:38:24 +02:00
Avi Kivity	390a0ca47b	types: allow lists with NULL Allow transient lists that contain NULL throughout the evaluation machinery. This makes is possible to evalute things like `IF col IN (1, 2, NULL)` without hacks, once LWT conditions are converted to expressions. A few tests are relaxed to accommodate the new behavior: - cql_query_test's test_null_and_unset_in_collections is relaxed to allow `WHERE col IN ?`, with the variable bound to a list containing NULL; now it's explicitly allowed - expr_test's evaluate_bind_variable_validates_no_null_in_list was checking generic lists for NULLs, and was similary relaxed (and renamed) - expr_Test's evaluate_bind_variable_validates_null_in_lists_recursively was similarly relaxed to allow NULLs.	2023-01-18 10:38:24 +02:00
Avi Kivity	00145f9ada	test: relax NULL check test predicate When we start allowing NULL in lists in some contexts, the exact location where an error is raised (when it's disallowed) will change. To prepare for that, relax the exception check to just ensure the word NULL is there, without caring about the exact wording.	2023-01-18 10:38:24 +02:00
Avi Kivity	5f8540ecfa	cql3, types: validate listlike collections (sets, lists) for storage Lists allow NULL in some contexts (bind variables for LWT "IN ?" conditions), but not in most others. Currently, the implementation just disallows NULLs in list values, and the cases where it is allowed are hacked around. To reduce the special cases, we'll allow lists to have NULLs, and just restrict them for storage. This is similar to how scalar values can be NULL, but not when they are part of a partition key. To prepare for the transition, identify the locations where lists (and sets, which share the same storage) are stored as frozen values and add a NULL check there. Non-frozen lists already have the check. Since sets share the same format as lists, apply the same to them. No actual checks are done yet, since NULLs are impossible. This is just a stub.	2023-01-18 10:38:24 +02:00
Avi Kivity	da4abccf89	types: make empty type deserialize to non-null value The empty type is used internally to implement CQL sets on top of multi-cell maps. The map's key (an atomic cell) represents the set value, and the map's value is discarded. Since it's unneeded we use an internal "empty" type. Currently, it is deserialized into a `data_value` object representing a NULL. Since it's discarded, it really doesn't matter. However, with the impending change to change lists to allow NULLs, it does matter: 1. the coordinator sets the 'collections_as_maps' flag for LWT requests since it wants list indexes (this affects sets too). 2. the replica responds by serializing a set as a map. 3. since we start allow NULL collection values, we now serialize those NULLs as NULLs. 4. the coordinator deserializes the map, and complains about NULL values, since those are not supported. The solution is simple, deserialize the empty value as a non-NULL object. We create an empty empty_type_representation and add the scaffolding needed. Serialization and deserialization is already coded, it was just never called for NULL values (which were serialized with size 0, in collections, rather than size -1, luckily). A unit test is added.	2023-01-18 10:38:24 +02:00
Tomasz Grabiec	563998b69a	Merge 'raft: improve group 0 reconfiguration failure handling' from Kamil Braun Make it so that failures in `removenode`/`decommission` don't lead to reduced availability, and any leftovers in group 0 can be removed by `removenode`: - In `removenode`, make the node a non-voter before removing it from the token ring. This removes the possibility of having a group 0 voting member which doesn't correspond to a token ring member. We can still be left with a non-voter, but that's doesn't reduce the availability of group 0. - As above but for `decommission`. - Make it possible to remove group 0 members that don't correspond to token ring members from group 0 using `removenode`. - Add an API to query the current group 0 configuration. Fixes #11723. Closes #12502 * github.com:scylladb/scylladb: test: test_topology: test for removing garbage group 0 members test/pylib: move some utility functions to util.py db: system_keyspace: add a virtual table with raft configuration db: system_keyspace: improve system.raft_snapshot_config schema service: storage_service: better error handling in `decommission` service: storage_service: fix indentation in removenode service: storage_service: make `removenode` work for group 0 members which are not token ring members service/raft: raft_group0: perform read_barrier in wait_for_raft service: storage_service: make leaving node a non-voter before removing it from group 0 in decommission/removenode test: test_raft_upgrade: remove test_raft_upgrade_with_node_remove service/raft: raft_group0: link to Raft docs where appropriate service/raft: raft_group0: more logging service/raft: raft_group0: separate function for checking and waiting for Raft	2023-01-17 21:23:15 +01:00
Kamil Braun	d134c458e5	test/pylib: increase timeout when waiting for cluster before test Increase the timeout from default 5 minutes to 10 minutes. Sent as a workaround for #12546 to unblock next promotions. Closes #12547	2023-01-17 21:03:09 +02:00
Kamil Braun	4f1c317bdc	test: test_raft_upgrade: stop servers gracefully in test_recovery_after_majority_loss This test is frequently failing due to a timeout when we try to restart one of the nodes. The shutdown procedure apparently hangs when we try to stop the `hints_manager` service, e.g.: ``` INFO 2023-01-13 03:18:02,946 [shard 0] hints_manager - Asked to stop INFO 2023-01-13 03:18:02,946 [shard 0] hints_manager - Stopped INFO 2023-01-13 03:18:02,946 [shard 0] hints_manager - Asked to stop INFO 2023-01-13 03:18:02,946 [shard 1] hints_manager - Asked to stop INFO 2023-01-13 03:18:02,946 [shard 1] hints_manager - Stopped INFO 2023-01-13 03:18:02,946 [shard 1] hints_manager - Asked to stop INFO 2023-01-13 03:18:02,946 [shard 1] hints_manager - Stopped INFO 2023-01-13 03:22:56,997 [shard 0] hints_manager - Stopped ``` observe the 5 minute delay at the end. There is a known issue about `hints_manager` stop hanging: #8079. Now, for some reason, this is the only test case that is hitting this issue. We don't completely understand why. There is one significant difference between this test case and others: this is the only test case which kills 2 (out of 3) servers in the cluster and then tries to gracefully shutdown the last server. There's a hypothesis that the last server gets stuck trying to send hints to the killed servers. We weren't able to prove/falsify it yet. But if it's true, then this patch will: - unblock next promotions, - give us some important information when we see that the issue stops appearing. In the patch we shutdown all servers gracefully instead of killing them, like we do in the other test cases. Closes #12548	2023-01-17 20:51:09 +02:00
Pavel Emelyanov	4f415413d2	raft: Fix non-existing state_machine::apply_entry in docs The docs mention that method, but it doesn't exist. Instead, the state_machine interface defines plain .apply() one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12541	2023-01-17 12:53:05 +01:00
Kamil Braun	5545547d07	test: test_topology: test for removing garbage group 0 members Verify that `removenode` can remove group 0 members which are not token ring members.	2023-01-17 12:28:00 +01:00
Kamil Braun	c959ec455a	test/pylib: move some utility functions to util.py They were used in test_raft_upgrade, but we want to use them in other test files too.	2023-01-17 12:28:00 +01:00
Kamil Braun	a483915c62	db: system_keyspace: add a virtual table with raft configuration Add a new virtual table `system.raft_state` that shows the currently operating Raft configuration for each present group. The schema is the same as `system.raft_snapshot_config` (the latter shows the config from the last snapshot). In the future we plan to add more columns to this table, showing more information (like the current leader and term), hence the generic name. Adding the table requires some plumbing of `sharded<raft_group_registry>&` through function parameters to make it accessible from `register_virtual_tables`, but it's mostly straightforward. Also added some APIs to `raft_group_registry` to list all groups and find a given group (returning `nullptr` if one isn't found, not throwing an exception).	2023-01-17 12:28:00 +01:00
Kamil Braun	2bfe85ce9b	db: system_keyspace: improve system.raft_snapshot_config schema Remove the `ip_addr` column which was not used. IP addresses are not part of Raft configuration now and they can change dynamically. Swap the `server_id` and `disposition` columns in the clustering key, so when querying the configuration, we first obtain all servers with the current disposition and then all servers with the previous disposition (note that a server may appear both in current and previous).	2023-01-17 12:28:00 +01:00
Kamil Braun	c3ed82e5fb	service: storage_service: better error handling in `decommission` Improve the error handling in `decommission` in case `leave_group0` fails, informing the user what they should do (i.e. call `removenode` to get rid of the group 0 member), and allowing decommission to finish; it does not make sense to let the node continue to run after it leaves the token ring. (And I'm guessing it's also not safe. Or maybe impossible.)	2023-01-17 12:28:00 +01:00
Kamil Braun	beb0eee007	service: storage_service: fix indentation in removenode	2023-01-17 12:28:00 +01:00
Kamil Braun	aba33dd352	service: storage_service: make `removenode` work for group 0 members which are not token ring members Due to failures we might end up in a situation where we have a group 0 member which is not a token ring member: a decommission/removenode which failed after leaving/removing a node from the token ring but before leaving / removing a node from group 0. There was no way to get rid of such a group 0 member. A node that left the token ring must not be allowed to run further (or it can cause data loss, data resurrection and maybe other fun stuff), so we can't run decommission a second time (even if we tried, it would just say that "we're not a member of the token ring" and abort). And `removenode` would also not work, because it proceeds only if the node requested to be removed is a member of the token ring. We modify `removenode` so it can run in this situation and remove the group 0 member. The parts of `removenode` related to token ring modification are now conditioned on whether the node was a member of the token ring. The final `remove_from_group0` step is in its own branch. Some minor refactors were necessary. Some log messages were also modified so it's easier to understand which messages correspond the "token movement" part of the procedure. The `make_nonvoter` step happens only if token ring removal happens, otherwise we can skip directly to `remove_from_group0`. We also move `remove_from_group0` outside the "try...catch", fixing #11723. The "node ops" part of the procedure is related strictly to token ring movement, so it makes sense for `remove_from_group0` to happen outside. Indentation is broken in this commit for easier reviewability, fixed in the following commit. Fixes: #11723	2023-01-17 12:28:00 +01:00
Kamil Braun	ec2cd29e42	service/raft: raft_group0: perform read_barrier in wait_for_raft Right now wait_for_raft is called before performing group 0 configuration changes. We want to also call it before checking for membership, for that it's desirable to have the most recent information, hence call read_barrier. In the existing use cases it's not strictly necessary, but it doesn't hurt.	2023-01-17 12:28:00 +01:00
Kamil Braun	db734cd74f	service: storage_service: make leaving node a non-voter before removing it from group 0 in decommission/removenode removenode currently works roughly like this: 1. stream/repair data so it ends up on new replica sets (calculated without the node we want to remove) 2. remove the node from the token ring 3. remove the node from group 0 configuration. If the procedure fails before after step 2 but before step 3 finishes, we're in trouble: the cluster is left with an additional voting group 0 member, which reduces group 0's availability, and there is no way to remove this member because `removenode` no longer considers it to be part of the cluster (it consults the token ring to decide). Improve this failure scenario by including a new step at the beginning: make the node a non-voter in group 0 configuration. Then, even if we fail after removing the node from the token ring but before removing it from group 0, we'll only be left with a non-voter which doesn't reduce availability. We make a similar change for `decommission`: between `unbootstrap()` (which streams data) and `leave_ring()` (which removes our tokens from the ring), become a non-voter. The difference here is that we don't become a non-voter at the beginning, but only after streaming/repair. In `removenode` it's desirable to make the node a non-voter as soon as possible because it's already dead. In decommission it may be desirable for us to remain a voter if we fail during streaming because we're still alive and functional in that case. In a later commit we'll also make it possible to retry `removenode` to remove a node that is only a group 0 member and not a token ring member.	2023-01-17 12:28:00 +01:00
Kamil Braun	1eee349a17	test: test_raft_upgrade: remove test_raft_upgrade_with_node_remove The test would create a scenario where one node was down while the others started the Raft upgrade procedure. The procedure would get stuck, but it was possible to `removenode` the downed node using one of the alive nodes, which would unblock the Raft upgrade procedure. This worked because: 1. the upgrade procedure starts by ensuring that all peers can be contacted, 2. `removenode` starts by removing the node from the token ring. After removing the node from the token ring, the upgrade procedure becomes able to contact all peers (the peers set no longer contains the down node). At the end, after removing the node from the token ring, `removenode` would actually get stuck for a while, waiting for the upgrade procedure to finish before removing the peer from group 0. After the upgrade procedure finished, `removenode` would also finish. (so: first the upgrade procedure waited for removenode, then removenode waited for the upgrade procedure). We want to modify the `removenode` procedure and include a new step before removing the node from the token ring: making the node a non-voter. The purpose is to improve the possible failure scenarios. Previously, if the `removenode` procedure failed after removing the node from the token ring but before removing it from group 0, the cluster would contain a 'garbage' group 0 member which is a voter - reducing group 0's availability. If the node is made a non-voter first, then this failure will not be as big of a problem, because the leftover group 0 member will be a non-voter. However, to correctly perform group 0 operations including making someone a nonvoter, we must first wait for the Raft upgrade procedure to finish (or at least wait until everyone joins group 0). Therefore by including this 'make the node a non-voter' step at the beginning of `removenode`, we make it impossible to remove a token ring member in the middle of the upgrade procedure, on which the test case relied. The test case would get stuck waiting for the `removenode` operation to finish, which would never finish because it would wait for the upgrade procedure to finish, which would not finish because of the dead peer. We remove the test case; it was "lucky" to pass in the first place. We have a dedicated mechanism for handling dead peers during Raft upgrade procedure: the manual Raft group 0 RECOVERY procedure. There are other test cases in this file which are using that procedure.	2023-01-17 12:28:00 +01:00
Kamil Braun	4f0801406e	service/raft: raft_group0: link to Raft docs where appropriate Resolve some TODOs.	2023-01-17 12:28:00 +01:00
Kamil Braun	2befbaa341	service/raft: raft_group0: more logging Make the logs in leave_group0 consistent with logs in remove_from_group0.	2023-01-17 12:28:00 +01:00
Kamil Braun	77dc1c4c70	service/raft: raft_group0: separate function for checking and waiting for Raft leave_group0 and remove_from_group0 functions both start with the following steps: - if Raft is disabled or in RECOVERY mode, print a simple log message and abort - if Raft cluster feature flag is not yet enabled, print a complex log message and abort - wait for Raft upgrade procedure to finish - then perform the actual group 0 reconfiguration. Refactor these preparation steps to a separate function, `wait_for_raft`. This reduces code duplication; the function will also be used in more operations later (becoming a nonvoter or turning another server into a nonvoter). We also change the API so that the preparation function is called from outside by the caller before they call the reconfiguration function. This is because in later commits, some of the call sites (mainly `removenode`) will want to check explicitly whether Raft is enabled and wait for Raft's availabilty, then perform a sequence of steps related to group 0 configuration depending on the result. Also add a private function `raft_upgrade_complete()` which we use to assert that Raft is ready to be used.	2023-01-17 12:27:58 +01:00
Wojciech Mitros	5f45b32bfa	forward_service: prevent heap use-after-free of forward_aggregates Currently, we create `forward_aggregates` inside a function that returns the result of a future lambda that captures these aggregates by reference. As a result, the aggregates may be destructed before the lambda finishes, resulting in a heap use-after-free. To prolong the lifetime of these aggregates, we cannot use a move capture, because the lambda is wrapped in a with_thread_if_needed() call on these aggregates. Instead, we fix this by wrapping the entire return statement in a do_with(). Fixes #12528 Closes #12533	2023-01-17 13:25:57 +02:00
Botond Dénes	8ea128cc27	test: reader_concurrency_semaphore_test: add tests for semaphore memory limits	2023-01-17 05:27:04 -05:00
Botond Dénes	ec1c615029	reader_permit: expose operator<<(reader_permit::state)	2023-01-17 05:27:04 -05:00
Botond Dénes	78583b84f1	reader_permit: add id() accessor Effectively returns the address of the underlying permit impl as an `uintptr_t`. This can be used to determine the identity of the permit.	2023-01-17 05:27:04 -05:00
Botond Dénes	7f8469db27	reader_concurrency_semaphore: add foreach_permit() Allows iterating over all permits.	2023-01-17 05:27:04 -05:00
Botond Dénes	4c70b58993	reader_concurrency_semaphore: document the new memory limits	2023-01-17 05:27:04 -05:00
Botond Dénes	edb32cb171	reader_concurrency_semaphore: add OOM killer When the collective memory consumption of all readers goes above $kill_limit_multiplier * $memory_limit, consume() will throw std::bad_alloc(), instantly unwinding the read that is unlucky enough to have requested the last bytes of memory. This should help situation where there are some problematic partitions, either because of large cells or because they are scattered in too many sstables. Currently nothing prevents such reads from bringing down the entire node via OOM.	2023-01-17 05:27:04 -05:00
Botond Dénes	81e2a2be7d	reader_concurrency_semaphore: make consume() and signal() private Using this API is quite dangerous as any mistakes can lead to leaking resources from the semaphore. Also, soon we will tie this API closer to permits, so they won't be as generic. Make them private so we don't have to worry about correct usage. All external users are patched away already.	2023-01-17 05:27:04 -05:00
Botond Dénes	ab18e7b178	test: stop using reader_concurrency_semaphore::{consume,signal}() directly These methods will soon be retired (made private) so migrate away from them. Consume memory through a permit instead. It is also safer this way: all memory consumed through the permit is guaranteed to be released when the permit is destroyed at the latest.	2023-01-17 05:27:04 -05:00
Botond Dénes	8f9e8aafdf	reader_concurrency_semaphore: move consume() out-of-line Its about to get a little bit more complex.	2023-01-17 05:27:04 -05:00
Botond Dénes	e4ef28284b	reader_permit: consume(): make it exception-safe reader_concurrency_semaphroe::consume() will soon throw.	2023-01-17 05:27:04 -05:00
Botond Dénes	029269af42	reader_permit: resource_units::reset(): only call consume() if needed reset() is called from the destructor, with null resources. Calling consume() can be avoided in this case and in fact it is required as consume() is soon going to throw in some cases.	2023-01-17 05:27:04 -05:00
Botond Dénes	dd9a0a16e6	reader_concurrency_semaphore: tracked_file_impl: use request_memory() Use the recently added `request_memory()` to aquire the memory units for the I/O. This allows blocking all but one readers when memory consumption grows too high.	2023-01-17 05:27:04 -05:00
Botond Dénes	9ed5d861be	reader_concurrency_semaphore: add request_memory() A possibly blocking request for more memory. If the collective memory consumption of all reads goes above $serialize_limit_multiplier * $memory_limit this request will block for all but one reader (the first requester). Until this situation is resolved, that is until memory stays above the above explained limit, only this one reader is allowed to make progress. This should help reign in the memory consumption of reads in a situation where their memory consumption used to baloon without constraints before.	2023-01-17 05:27:04 -05:00
Gleb Natapov' via ScyllaDB development	15ebd59071	lwt: upgrade stored mutations to the latest schema during prepare Currently they are upgraded during learn on a replica. The are two problems with this. First the column mapping may not exist on a replica if it missed this particular schema (because it was down for instance) and the mapping history is not part of the schema. In this case "Failed to look up column mapping for schema version" will be thrown. Second lwt request coordinator may not have the schema for the mutation as well (because it was freed from the registry already) and when a replica tries to retrieve the schema from the coordinator the retrieval will fail causing the whole request to fail with "Schema version XXXX not found" Both of those problems can be fixed by upgrading stored mutations during prepare on a node it is stored at. To upgrade the mutation its column mapping is needed and it is guarantied that it will be present at the node the mutation is stored at since it is pre-request to store it that the corresponded schema is available. After that the mutation is processed using latest schema that will be available on all nodes. Fixes #10770 Message-Id: <Y7/ifraPJghCWTsq@scylladb.com>	2023-01-17 11:14:46 +01:00
Raphael S. Carvalho	f2f839b9cc	compaction: LCS: don't reshape all levels if only a single breaks disjointness LCS reshape is compacting all levels if a single one breaks disjointness. That's unnecessary work because rewriting that single level is enough to restore disjointness. If multiple levels break disjointness, they'll each be reshaped in its own iteration, so reducing operation time for each step and disk space requirement, as input files can be released incrementally. Incremental compaction is not applied to reshape yet, so we need to avoid "major compaction", to avoid the space overhead. But space overhead is not the only problem, the inefficiency, when deciding what to reshape when overlapping is detected, motivated this patch. Fixes #12495. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12496	2023-01-17 09:55:15 +02:00
Michał Chojnowski	9e17564c70	types: add some missing explicit instantiations Some functions defined by a template in types.cc are used in other translation units (via `cql3/untyped_result_set.hh`), but aren't explicitly instantiated. Therefore their linking can fail, depending on inlining decisions. (I experienced this when playing with compiler options). Fix that. Closes #12539	2023-01-17 10:46:01 +02:00
Nadav Har'El	5bf94ae220	cql: allow disabling of USING TIMESTAMP sanity checking As requested by issue #5619, commit `2150c0f7a2` added a sanity check for USING TIMESTAMP - the number specified in the timestamp must not be more than 3 days into the future (when viewed as a number of microseconds since the epoch). This sanity checking helps avoid some annoying client-side bugs and mis-configurations, but some users genuinely want to use arbitrary or futuristic-looking timestamps and are hindered by this sanity check (which Cassandra doesn't have, by the way). So in this patch we add a new configuration option, restrict_future_timestamp If set to "true", futuristic timestamps (more than 3 days into the future) are forbidden. The "true" setting is the default (as has been the case sinced #5619). Setting this option to "false" will allow using any 64-bit integer as a timestamp, like is allowed Cassanda (and was allowed in Scylla prior to #5619. The error message in the case where a futuristic timestamp is rejected now mentions the configuration paramter that can be used to disable this check (this, and the option's name "restrict_*", is similar to other so-called "safe mode" options). This patch also includes a test, which works in Scylla and Cassandra, with either setting of restrict_future_timestamp, checking the right thing in all these cases (the futuristic timestamp can either be written and read, or can't be written). I used this test to manually verify that the new option works, defaults to "true", and when set to "false" Scylla behaves like Cassandra. Fixes #12527 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12537	2023-01-16 23:18:56 +02:00
Kefu Chai	114f30016a	main: use std::shift_left() to consume tool name for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12536	2023-01-16 21:01:34 +02:00
Nadav Har'El	feef3f9dda	test/cql-pytest: test more than one restriction on same clustering column Cassandra refuses a request with more than one relation to the same clustering column, for example DELETE FROM tbl WHERE p = ? and c = ? AND c > ? complains that c cannot be restricted by more than one relation if it includes an Equal But it produces different error messages for different operators and even order. Currently, Scylla doesn't consider such requests an error. Whether or not we should be compatible with Cassandra here is discussed in issue #12472. But as long as we do accept these queries, we should be sure we do the right thing: "WHERE c = 1 AND c > 2" should match nothing, "WHERE c = 1 AND c > 0" should match the matches of c = 1, and so on. This patch adds a test for verify that these requests indeed yield correct results. The test is scylla_only because, as explained above, Cassandra doesn't support these requests at all. Refs #12472 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12498	2023-01-16 20:41:16 +02:00
Kefu Chai	86b451d45c	SCYLLA-VERSION-GEN: remove unnecessary bashism remove unnecessary bashism, so that this script can be interpreted by a POSIX shell. /bin/sh is specified in the shebang line. on debian derivatives, /bin/sh is dash, which is POSIX compliant. but this script is written in the bash dialect. before this change, we could run into following build failure when building the tree on Debian: [7/904] ./SCYLLA-VERSION-GEN ./SCYLLA-VERSION-GEN: 37: [[: not found after this change, the build is able to proceed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12530	2023-01-16 20:34:01 +02:00
Avi Kivity	0b418fa7cf	cql3, transport, tests: remove "unset" from value type system The CQL binary protocol introduced "unset" values in version 4 of the protocol. Unset values can be bound to variables, which cause certain CQL fragments to be skipped. For example, the fragment `SET a = :var` will not change the value of `a` if `:var` is bound to an unset value. Unsets, however, are very limited in where they can appear. They can only appear at the top-level of an expression, and any computation done with them is invalid. For example, `SET list_column = [3, :var]` is invalid if `:var` is bound to unset. This causes the code to be littered with checks for unset, and there are plenty of tests dedicated to catching unsets. However, a simpler way is possible - prevent the infiltration of unsets at the point of entry (when evaluating a bind variable expression), and introduce guards to check for the few cases where unsets are allowed. This is what this long patch does. It performs the following: (general) 1. unset is removed from the possible values of cql3::raw_value and cql3::raw_value_view. (external->cql3) 2. query_options is fortified with a vector of booleans, unset_bind_variable_vector, where each boolean corresponds to a bind variable index and is true when it is unset. 3. To avoid churn, two compatiblity structs are introduced: cql3::raw_value{,_view}_vector_with_unset, which can be constructed from a std::vector<raw_value{,_view/}>, which is what most callers have. They can also be constructed with explicit unset vectors, for the few cases they are needed. (cql3->variables) 4. query_options::get_value_at() now throws if the requested bind variable is unset. This replaces all the throwing checks in expression evaluation and statement execution, which are removed. 5. A new query_options::is_unset() is added for the users that can tolerate unset; though it is not used directly. 6. A new cql3::unset_operation_guard class guards against unsets. It accepts an expression, and can be queried whether an unset is present. Two conditions are checked: the expression must be a singleton bind variable, and at runtime it must be bound to an unset value. 7. The modification_statement operations are split into two, via two new subclasses of cql3::operation. cql3::operation_no_unset_support ignores unsets completely. cql3::operation_skip_if_unset checks if an operand is unset (luckily all operations have at most one operand that tolerates unset) and applies unset_operation_guard to it. 8. The various sites that accept expressions or operations are modified to check for should_skip_operation(). This are the loops around operations in update_statement and delete_statement, and the checks for unset in attributes (LIMIT and PER PARTITION LIMIT) (tests) 9. Many unset tests are removed. It's now impossible to enter an unset value into the expression evaluation machinery (there's just no unset value), so it's impossible to test for it. 10. Other unset tests now have to be invoked via bind variables, since there's no way to create an unset cql3::expr::constant. 11. Many tests have their exception message match strings relaxed. Since unsets are now checked very early, we don't know the context where they happen. It would be possible to reintroduce it (by adding a format string parameter to cql3::unset_operation_guard), but it seems not to be worth the effort. Usage of unsets is rare, and it is explicit (at least with the Python driver, an unset cannot be introduced by ommission). I tried as an alternative to wrap cql3::raw_value{,_view} (that doesn't recognize unsets) with cql3::maybe_unset_value (that does), but that caused huge amounts of churn, so I abandoned that in favor of the current approach. Closes #12517	2023-01-16 21:10:56 +02:00
Marcin Maliszkiewicz	6f055ca5f9	alternator: evaluate expressions as false for stored malformed binary data We'll try to distinguish the case when data comes from the storage rather than user reuqest. Such attribute can be used in expressions and when it can't be decoded it should make expression evaluate as false to simply exclude the row during filter query or scan. Note that this change focuses on binary type, for other types we may have some inconsistencies in the implementation.	2023-01-16 15:15:27 +01:00
Marcin Maliszkiewicz	bcbaccc143	rjson: avoid copy constructors in from_string calls when possible This function anyway copies the value so no need to do extra copy.	2023-01-16 15:15:26 +01:00
Kamil Braun	7510144fba	Merge 'Add replace-node-first-boot option' from Benny Halevy Allow replacing a node given its Host ID rather than its ip address. This series adds a replace_node_first_boot option to db/config and makes use of it in storage_service. The new option takes priority over the legacy replace_address* options. When the latter are used, a deprecation warning is printed. Documentation updated respectively. And a cql unit_test is added. Ref #12277 Closes #12316 * github.com:scylladb/scylladb: docs: document the new replace_node_first_boot option dist/docker: support --replace-node-first-boot db: config: describe replace_address* options as deprecated test: test_topology: test replace using host_id test: pylib: ServerInfo: add host_id storage_service: get rid of get_replace_address storage_service: is_replacing: rely directly on config options storage_service: pass replacement_info to run_replace_ops storage_service: pass replacement_info to booststrap storage_service: join_token_ring: reuse replacement_info.address storage_service: replacement_info: add replace address init: do not allow cfg.replace_node_first_boot of seed node db: config: add replace_node_first_boot option	2023-01-16 15:08:31 +01:00
Marcin Maliszkiewicz	668fffb6c5	alternator: remove unused parameters from describe_items func	2023-01-16 14:36:23 +01:00
Marcin Maliszkiewicz	86dc1bfdb1	utils: throw error on malformed input in base64 decode We already fixed the case of missing padding but there is also more generic one where input for decode function contains non base64 characters. This is mostly done for alternator purpose, it should discard the request containing such data and return 400 http error. Addionally some harmless integer overflow during integer casting was fixed here. This was attempted to be fixed by `2d33a3f` but since we also implicitly cast to uint8_t the problem persisted.	2023-01-16 14:36:23 +01:00
Marcin Maliszkiewicz	f53c0fd0fc	utils: throw error on missing padding in base64 decode This is done to make alternator behavior more on a pair with dynamodb. Decode function is used there when processing user requests containing binary item values. We will now discard improperly formed user input with 400 http error. It also makes it more consistent as some of our other base64 functions may have assumed padding is present. The patch should not break other usages of base64 functions as the only one is in db/hints where the code already throws std::runtime_error. Fixes #6487	2023-01-16 14:36:23 +01:00
Michał Sala	bbbe12af43	forward_service: fix timeout support in parallel aggregates `forward_request` verb carried information about timeouts using `lowres_clock::time_point` (that came from local steady clock `seastar::lowres_clock`). The time point was produced on one node and later compared against other node `lowres_clock`. That behavior was wrong (`lowres_clock::time_point`s produced with different `lowres_clock`s cannot be compared) and could lead to delayed or premature timeout. To fix this issue, `lowres_clock::time_point` was replaced with `lowres_system_clock::time_point` in `forward_request` verb. Representation to which both time point types serialize is the same (64-bit integer denoting the count of elapsed nanoseconds), so it was possible to do an in-place switch of those types using logic suggested by @avikivity: - using steady_clock is just broken, so we aren't taking anything from users by breaking it further - once all nodes are upgraded, it magically starts to work Closes #12529	2023-01-16 12:08:13 +02:00
Botond Dénes	3d9ab1d9eb	Merge 'Get recursive tasks' statuses with task manager api call' from Aleksandra Martyniuk The PR adds an api call allowing to get the statuses of a given task and all its descendants. The parent-child tree is traversed in BFS order and the list of statuses is returned to user. Closes #12317 * github.com:scylladb/scylladb: test: add test checking recursive task status api: get task statuses recursively api: change retrieve_status signature	2023-01-16 11:44:50 +02:00
Botond Dénes	969beebe5f	reader_concurrency_semaphore: wrap wait list The wait list will become two lists soon. To keep callers simple (as if there was still one list) we wrap it with a wrapper which abstracts this away.	2023-01-16 02:05:27 -05:00
Botond Dénes	8658cfc066	reader_concurrency_semaphore: add {serialize,kill}_limit_multiplier parameters Propagate the recently added reader_concurrency_semaphore_{serialize,kill}_limit_multiplier config items to the semaphore. Not used yet.	2023-01-16 02:05:27 -05:00
Botond Dénes	24d4b484f2	test/boost/reader_concurrency_semaphore_test: dummy_file_impl: don't use hardoced buffer size In `dma_read_bulk()`, use the `range_size` passed as parameter and have the callers pass meaningful sizes. We got away with callers passing 0 and using a hard-coded size internally because the tracking file wrapper used the size of the returned buffer as the basis for memory tracking. This will soon not be the case and instead the passed-in size will be used, so this has to be fixed.	2023-01-16 02:05:27 -05:00
Botond Dénes	8b0afc28d4	reader_permit: add make_new_tracked_temporary_buffer() A separate method for callers of make_tracked_temporary_buffer() who are creating new empty tracked buffers of a certain size. make_tracked_temporary_buffer() is about to be changed to be more targeted at callers who call it with pre-consumed memory units.	2023-01-16 02:05:27 -05:00
Botond Dénes	397266f420	reader_permit: add get_state() accessor	2023-01-16 02:05:27 -05:00
Botond Dénes	87e2bf90b9	reader_permit: resource_units: add constructor for already consumed res	2023-01-16 02:05:27 -05:00
Botond Dénes	d2cfc25494	reader_permit: resource_units: remove noexcept qualifier from constructor It won't be noexcept soon. Also make it exception safe.	2023-01-16 02:05:27 -05:00
Botond Dénes	7eb093899a	db/config: introduce reader_concurrency_semaphore_{serialize,kill}_limit_multiplier Will be propagated to reader concurrency semaphores. Not wired in yet.	2023-01-16 02:05:27 -05:00
Botond Dénes	a019dbaa34	scylla-gdb.py: scylla-memory: extract semaphore stats formatting code So it can be shared for the 3 semaphores, instead of repeating the same open-coded method for each of them.	2023-01-16 02:05:27 -05:00
Botond Dénes	15d6d34cfa	scylla-gdb.py: fix spelling of "graphviz"	2023-01-16 02:05:27 -05:00
Tzach Livyatan	073f0f00c6	Add Scylla Summit 2023 in the top banner Closes #12519	2023-01-16 08:05:20 +02:00
Avi Kivity	5a07641b95	Update python3 submodule (license file fix) * tools/python3 548e860...279b6c1 (1): > create-relocatable-package: s/pyhton3-libs/python3-libs/	2023-01-15 17:59:27 +02:00
Benny Halevy	de3142e540	docs: document the new replace_node_first_boot option And mention that replacing a node using the legacy replace_addr* options is deprecated. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:41:44 +02:00
Benny Halevy	d4f1563369	dist/docker: support --replace-node-first-boot And mention that replace_address_first_boot is deprecated Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:36:09 +02:00
Benny Halevy	1577aa8098	db: config: describe replace_address* options as deprecated The replace_address options are still supported But mention in their description that they are now deprecated and the user should use replace_node_first_boot instead. While at it fix a typo in ignore_dead_nodes_for_replace Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:36:09 +02:00
Benny Halevy	90faeedb77	test: test_topology: test replace using host_id Add test cases exercising the --replace-node-first-boot option by replacing nodes using their host_id rather than ip address. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:36:09 +02:00
Benny Halevy	7d0d9e28f1	test: pylib: ServerInfo: add host_id Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:36:07 +02:00
Benny Halevy	db2b76beb5	storage_service: get rid of get_replace_address It is unused now. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:34:29 +02:00
Benny Halevy	17f70e4619	storage_service: is_replacing: rely directly on config options Rather than on get_replace_address, before we remove the latter. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:34:29 +02:00
Benny Halevy	7282d58d11	storage_service: pass replacement_info to run_replace_ops So it won't need to call get_replace_address. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:34:09 +02:00
Benny Halevy	08598e4f64	storage_service: pass replacement_info to booststrap So it won't need to call get_replace_address. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:30:48 +02:00
Benny Halevy	b863f7a75f	storage_service: join_token_ring: reuse replacement_info.address Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:30:48 +02:00
Benny Halevy	add2f209b8	storage_service: replacement_info: add replace address Populate replacement_info.address in prepare_replacement_info as a first step towards getting rid of get_replace_address(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:30:48 +02:00
Benny Halevy	75c8a5addc	init: do not allow cfg.replace_node_first_boot of seed node Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:30:48 +02:00
Benny Halevy	32e79185d4	db: config: add replace_node_first_boot option For replacing a node given its (now unique) Host ID. The existing options for replace_address* will be deprecated in the following patches and eventually we will stop supporting them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-13 18:30:48 +02:00
Tomasz Grabiec	abc43f97c9	Merge 'Simplify some Raft tables' from Kamil Braun Rename `system.raft_config` to `system.raft_snapshot_config` to make it clearer what the table stores. Remove the `my_server_id` partition key column from `system.raft_snapshot_config` and a corresponding column from `system.raft_snapshots` which would store the Raft server ID of the local node. It's unnecessary, all servers running on a given node in different groups will use the same ID - the Raft ID of the node which is equal to its Host ID. There will be no multiple servers running in a single Raft group on the same node. Closes #12513 * github.com:scylladb/scylladb: db: system_keyspace: remove (my_)server_id column from RAFT_SNAPSHOTS and RAFT_SNAPSHOT_CONFIG db: system_keyspace: rename 'raft_config' to 'raft_snapshot_config'	2023-01-13 00:23:21 +01:00
Botond Dénes	4e41e7531c	docs/dev/debugging.md: recommend open-coredump.sh for opening coredumps Leave the guide for manual opening in though, the script might not work in all cases. Also update the version example, we changed how development versions look like. Closes #12511	2023-01-12 19:30:59 +02:00
Botond Dénes	ab8171ffd5	open-coredump.sh: handle dev versions Like: 5.2.0~dev, which really means master. Don't try to checkout branch-5.2 in this case, it doesn't exist yet, checkout master instead. Closes #12510	2023-01-12 19:28:58 +02:00
Kamil Braun	be390285b6	db: system_keyspace: remove (my_)server_id column from RAFT_SNAPSHOTS and RAFT_SNAPSHOT_CONFIG A single node will run a single Raft server in any given Raft group, so this column is not necessary.	2023-01-12 16:48:50 +01:00
Kamil Braun	bed555d1e5	db: system_keyspace: rename 'raft_config' to 'raft_snapshot_config' Make it clear that the table stores the snapshot configuration, which is not necessarily the currently operating configuration (the last one appended to the log). In the future we plan to have a separate virtual table for showing the currently operating configuration, perhaps we will call it `system.raft_config`.	2023-01-12 16:21:26 +01:00
Botond Dénes	f87e3993ef	Merge 'configure.py: a bunch of clean-up changes' from Michał Chojnowski The planned integration of cross-module optimizations in scylladb/scylladb-enterprise requires several changes to `configure.py`. To minimize the divergence between the `configure.py`s of both repositories, this series upstreams some of these changes to scylladb/scylladb. The changes mostly remove dead code and fix some traps for the unaware. Closes #12431 * github.com:scylladb/scylladb: configure.py: prevent deduplication of seastar compile options configure.py: rename clang_inline_threshold() configure.py: rework the seastar_cflags variable configure.py: hoist the pkg_config() call for seastar-testing.pc configure.py: unify the libs variable for tests and non-tests configure.py: fix indentation configure.py: remove a stale code path for .a artifacts	2023-01-12 16:40:02 +02:00
Wojciech Mitros	082bfea187	rust: use depfile and Cargo.lock to avoid building rust when unnecessary Currently, we call cargo build every time we build scylla, even when no rust files have been changed. This is avoided by adding a depfile to the ninja rule for the rust library. The rust file is generated by default during cargo build, but it uses the full paths of all depenencies that it includes, and we use relative paths. This is fixed by specifying CARGO_BUILD_DEP_INFO_BASEDIR='.', which makes it so the current path is subtracted from all generated paths. Instead of using 'always' when specifying when to run the cargo build, a dependency on Cargo.lock is added additionally to the depfile. As a result, the rust files are recompiled not only when the source files included in the depfile are modified, but also when some rust dependency is updated. Cargo may put an old cached file as a result of the build even when the Cargo.lock was recently updated. Because of that, the the build result may be older than the Cargo.lock file even if the build was just performed. This may cause ninja to rebuilt the file every following time. To avoid this, we 'touch' the build result, so that its last modification time is up to date. Because the dependency on Cargo.lock was added, the new command for the build does not modify it. Instead, the developer must update it when modifying the dependencies - the docs are updated to reflect that. Closes #12489 Fixes #12508	2023-01-12 14:44:11 +02:00
Kefu Chai	77baea2add	docs/architecture: fix typo of SyllaDB s/SyllaDB/ScyllaDB/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12505	2023-01-12 12:25:53 +02:00
Michał Chojnowski	1ff4abef4a	configure.py: prevent deduplication of seastar compile options In its infinite wisdom, CMake deduplicates the options passed to `target_compile_options`, making it impossible to pass options which require duplication, such as -mllvm. Passing e.g. `-mllvm;-pgso=false;-mllvm;-inline-threshold=2500` invokes the compiler `-mllvm -pgso=false -inline-threshold=2500`, breaking the options. As a workaround, CMake added the `SHELL:` syntax, which makes it possible to pass the list of options not as a CMake list, but as a shell-quoted string. Let's use it, so we can pass multiple -mllvm options.	2023-01-12 11:24:10 +01:00
Michał Chojnowski	85facefe45	configure.py: rename clang_inline_threshold() There's a global variable (the CLI argument) with the same name. Rename one of the two to avoid accidental mixups.	2023-01-12 11:24:10 +01:00
Michał Chojnowski	d9de78f6d3	configure.py: rework the seastar_cflags variable The name of this variable is misleading. What it really does is pass flags to static libraries compiled by us, not just to seastar. We will need this capability to implement cross-artifact optimizations in our build. We will also need to pass linker flags, and we will need to vary those flags depending on the build mode. This patch splits the seastar_cflags variable into per-mode lib_cflags and lib_ldflags variables. It shouldn't change the resulting build.ninja for now, but will be needed by later planned patches.	2023-01-12 11:24:10 +01:00
Michał Chojnowski	ee462a9d3c	configure.py: hoist the pkg_config() call for seastar-testing.pc Put the pkg_config() for seastar-testing.pc in the same area as the call for seastar.pc, outside of the loop. This is a cosmetic change aimed at making following commits cleaner.	2023-01-12 11:24:10 +01:00
Michał Chojnowski	c9aeeeae11	configure.py: unify the libs variable for tests and non-tests This is a cosmetic change aimed at make following commits in the same area cleaner.	2023-01-12 11:24:09 +01:00
Michał Chojnowski	10ac881ef1	configure.py: fix indentation Fix indentation after the preceeding commit.	2023-01-12 11:23:32 +01:00
Michał Chojnowski	be419adaf8	configure.py: remove a stale code path for .a artifacts Scylla haven't had `.a` artifacts for a long time (since the Urchin days, I believe), and the piece of code responsible for them is stale and untested. Remove it.	2023-01-12 11:22:49 +01:00
Botond Dénes	8a86f8d4ef	gdbinit: add ignore clause for SIG35 Another real-time even often raised in scylla, making debugging a live process annoying. Closes #12507	2023-01-12 12:13:04 +02:00
Avi Kivity	7a8a442c1e	transport: drop some dead code around v1 and v2 protocols In `424dbf43f` ("transport: drop cql protocol versions 1 and 2"), we dropped support for protocols 1 and 2, but some code remains that checks for those versions. It is now dead code, so remove it. Closes #12497	2023-01-12 12:52:19 +02:00
Avi Kivity	4de2524a42	build: update toolchain for scylla-driver package Pull updated scylla-driver package, fixing an IP change related bug [1]. [1] https://github.com/scylladb/python-driver/issues/198 Closes #12501	2023-01-11 22:16:35 +02:00
Nadav Har'El	7192283172	Merge 'doc: add the upgrade guide for ScyllaDB 5.1 to ScyllaDB Enterprise 2022.2' from Anna Stuchlik Fix https://github.com/scylladb/scylladb/issues/12315 This PR adds the upgrade guide from ScyllaDB 5.1 to ScyllaDB Enterprise 2022.2. Instead of adding separate guides per platform, I've merged the information to create one platform-agnostic guide, similar to what we did for [OSS->OSS](https://docs.scylladb.com/stable/upgrade/upgrade-opensource/upgrade-guide-from-5.0-to-5.1/) and [Enterprise->Enterprise ](https://github.com/scylladb/scylladb/pull/12339)guides. Closes #12450 * github.com:scylladb/scylladb: doc: add the new upgrade guide to the toctree and fix its name docs: add the upgrade guide from ScyllaDB 5.1 to ScyllaDB Enterprise 2022.2	2023-01-11 21:01:34 +02:00
Avi Kivity	cb2cb8a606	utils: small_vector: mark throw_out_of_range() const It can be called from the const version of small_vector::at. Closes #12493	2023-01-11 20:58:53 +02:00
Nadav Har'El	04d6402780	docs: cql-extensions.md: explain our NULL handling Our handling of NULLs in expressions is different from Cassandra's, and more uniform. For example, the filter "WHERE x = NULL" is an error in Cassandra, but supported in Scylla. Let's explain how and why. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12494	2023-01-11 20:56:50 +02:00
Wojciech Mitros	95031074a5	configure: fix the order of rust header generation Currently, no rule enforces that the cxx.h rust header is generated before compiling the .cc files generated from rust. This patch adds this dependency. Closes #12492	2023-01-11 16:55:53 +02:00
Botond Dénes	210738c9ce	Merge 'test.py: improve logging' from Kamil Braun Make it easy to see which clusters are operated on by which tests in which build modes and so on. Add some additional logs. These improvements would have saved me a lot of debugging time if I had them last week and we would have https://github.com/scylladb/scylladb/pull/12482 much faster. Closes #12483 * github.com:scylladb/scylladb: test.py: harmonize topology logs with test.py format test/pylib: additional logging during cluster setup test/pylib: prefix cluster/manager logs with the current test name test/pylib: pool: pass args and *kwargs to the build function from get() test.py: include mode in ScyllaClusterManager logs	2023-01-11 16:32:56 +02:00
Aleksandra Martyniuk	fcb3f76e78	test: add test checking recursive task status Rest api test checking whether task manager api returns recursive tasks' statuses properly in BFS order.	2023-01-11 12:34:17 +01:00
Aleksandra Martyniuk	6b79c92cb7	api: get task statuses recursively Sometimes to debug some task manager module, we may want to inspect the whole tree of descendants of some task. To make it easier, an api call getting a list of statuses of the requested task and all its descendants in BFS order is added.	2023-01-11 12:34:06 +01:00
Konstantin Osipov	f3440240ee	test.py: harmonize topology logs with test.py format We need millisecond resolution in the log to be able to correlate test log with test.py log and scylla logs. Harmonize the log format for tests which actively manage scylla servers.	2023-01-11 10:09:42 +01:00
Kamil Braun	79712185d5	test/pylib: additional logging during cluster setup This would have saved me a lot of debugging time.	2023-01-11 10:09:42 +01:00
Kamil Braun	4f7e5ee963	test/pylib: prefix cluster/manager logs with the current test name The log file produced by test.py combines logs coming from multiple concurrent test runs. Each test has its own log file as well, but this "global" log file is useful when debugging problems with topology tests, since many events related to managing clusters are stored there. Make the logs easier to read by including information about the test case that's currently performing operations such as adding new servers to clusters and so on. This includes the mode, test run name and the name of the test case. We do this by using custom `Logger` objects (instead of calling `logging.info` etc. which uses the root logger) with `LoggerAdapter`s that include the prefixes. A bit of boilerplate 'plumbing' through function parameters is required but it's mostly straightforward. This doesn't apply to all events, e.g. boost test cases which don't setup a "real" Scylla cluster. These events don't have additional prefixes. Example: ``` 17:41:43.531 INFO> [dev/topology.test_topology.1] Cluster ScyllaCluster(name: 7a414ffc-903c-11ed-bafb-f4d108a9e4a3, running: ScyllaServer(1, 127.40.246.1, 29c4ec73-8912-45ca-ae19-8bfda701a6b5), ScyllaServer(4, 127.40.246.4, 75ae2afe-ff9b-4760-9e19-cd0ed8d052e7), ScyllaServer(7, 127.40.246.7, 67a27df4-be63-4b4c-a70c-aeac0506304f), stopped: ) adding server... 17:41:43.531 INFO> [dev/topology.test_topology.1] installing Scylla server in /home/kbraun/dev/scylladb/testlog/dev/scylla-10... 17:41:43.603 INFO> [dev/topology.test_topology.1] starting server at host 127.40.246.10 in scylla-10... 17:41:43.614 INFO> [dev/topology.test_topology.2] Cluster ScyllaCluster(name: 7a497fce-903c-11ed-bafb-f4d108a9e4a3, running: ScyllaServer(2, 127.40.246.2, f59d3b1d-efbb-4657-b6d5-3fa9e9ef786e), ScyllaServer(5, 127.40.246.5, 9da16633-ce53-4d32-8687-e6b4d27e71eb), ScyllaServer(9, 127.40.246.9, e60c69cd-212d-413b-8678-dfd476d7faf5), stopped: ) adding server... 17:41:43.614 INFO> [dev/topology.test_topology.2] installing Scylla server in /home/kbraun/dev/scylladb/testlog/dev/scylla-11... 17:41:43.670 INFO> [dev/topology.test_topology.2] starting server at host 127.40.246.11 in scylla-11... ```	2023-01-11 10:09:39 +01:00
Avi Kivity	de0c31b3b6	cql3: query_options: simplify batch query_options constructor The batch constructor uses an unnecessarily complicated template, where in fact it only vector<vector<raw_value \| raw_value_view>>. Simplify the constructor to allow exactly that. Delete some confusing comments around it. Closes #12488	2023-01-11 07:54:54 +02:00
Kamil Braun	2bda0f9830	test/pylib: pool: pass args and *kwargs to the build function from get() This will be used to specify a custom logger when building new clusters before starting tests, allowing to easily pinpoint which tests are waiting for clusters to be built and what's happening to these particular clusters.	2023-01-10 17:41:54 +01:00
Kamil Braun	ff2c030bf9	test.py: include mode in ScyllaClusterManager logs The logs often mention the test run and the current test case in a given run, such as `test_topology.1` and `test_topology.1::test_add_server_add_column`. However, if we run test.py in multiple modes, the different modes might be running the same test case and the logs become confusing. To disambiguate, prefix the test run/case names with the mode name. Example: ``` Leasing Scylla cluster ScyllaCluster(name: 7a414ffc-903c-11ed-bafb-f4d108a9e4a3, running: ScyllaServer(1, 127.40.246.1, 29c4ec73-8912-45ca-ae19-8bfda701a6b5), ScyllaServer(4, 127.40.246.4, 75ae2afe-ff9b-4 760-9e19-cd0ed8d052e7), ScyllaServer(7, 127.40.246.7, 67a27df4-be63-4b4c-a70c-aeac0506304f), stopped: ) for test dev/topology.test_topology.1::test_add_server_add_column ```	2023-01-10 17:41:54 +01:00
Wojciech Mitros	e558c7d988	functions: initialize aggregates on scylla start Currently, UDAs can't be reused if Scylla has been restarted since they have been created. This is caused by the missing initialization of saved UDAs that should have inserted them to the cql3::functions::functions::_declared map, that should store all (user-)created functions and aggregates. This patch adds the missing implementation in a way that's analogous to the method of inserting UDF to the _declared map. Fixes #11309	2023-01-10 17:44:18 +02:00
Wojciech Mitros	d1b809754c	database: wrap lambda coroutines used as arguments in coroutine::lambda Using lambda coroutines as arguments can lead to a use-after-free. Currently, the way these lambdas were used in do_parse_schema_tables did not lead to such a problem, but it's better to be safe and wrap them in coroutine::lambda(), so that they can't lead to this problem as long as we ensure that the lambda finishes in the do_parse_schema_tables() statement (for example using co_await). Closes #12487	2023-01-10 17:24:52 +02:00
Nadav Har'El	0edb090c67	test/cql-pytest: add simple tests for SELECT DISTINCT This patch adds a few simple functional test for the SELECT DISTINCT feature, and how it interacts with other features especiall GROUP BY. 2 of the 5 new tests are marked xfail, and reproduce one old and one newly-discovered issue: Refs #5361: LIMIT doesn't work when using GROUP BY (the test here uses LIMIT and GROUP BY together with SELECT DISTINCT, so the LIMIT isn't honored). Refs #12479: SELECT DISTINCT doesn't refuse GROUP BY with clustering column. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12480	2023-01-10 13:29:26 +02:00
Michał Radwański	dcab289656	boost/mvcc_test: use failure_injecting_allocation_strategy where it is meant to In test_apply_is_atomic, a basic form of exception testing is used. There is failure_injecting_allocation_strategy, which however is not used for any allocation, since for some reason, `with_allocator(r.allocator()` is used instead of `with_allocator(alloc`. Fix that. Closes #12354	2023-01-10 12:01:36 +01:00
Tomasz Grabiec	ebcd736343	cache: Fix undefined behavior when populating with non-full keys Regression introduced in `23e4c8315`. view_and_holder position_in_partiton::after_key() triggers undefined behavior when the key was not full because the holder is moved, which invalidates the view. Fixes #12367 Closes #12447	2023-01-10 12:51:54 +02:00
Jan Ciolek	8d7e35caef	cql3: expr: remove reference to temporary in get_rhs_receiver The function underlying_type() returns an data_type by value, but the code assigned it to a reference. At first I was sure this is an error (assigning temporary value to a reference), but it turns out that this is most likely correct due to C++ lifetime extension rules. I think it's better to avoid such unituitive tricks. Assigning to value makes it clearer that the code is correct and there are no dangling references. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> Closes #12485	2023-01-10 09:42:49 +02:00
Raphael "Raph" Carvalho	407c7fdaf2	docs: Fix command to create a symbolic link to relocatable pkg dir Closes #12481	2023-01-10 07:09:14 +02:00
Kamil Braun	822410c49b	test/pylib: scylla_cluster: release IPs when cluster is no longer needed With sufficiently many test cases we would eventually run out of IP addresses, because IPs (which are leased from a global host registry) would only be released at the end of an entire test suite. In fact we already hit this during next promotions, causing much pain indeed. Release IPs when a cluster, after being marked dirty, is stopped and thrown away. Closes #12482	2023-01-10 06:59:41 +02:00
Avi Kivity	e71e1dc964	Merge 'tools/scylla-sstable: add lua scripting support' from Botond Dénes Introduce a new "script" operation, which loads a script from the specified path, then feeds the mutation fragment stream to it. The script can then extract, process and present information from the sstable as it wishes. For now only Lua scripts are supported for the simple reason that Lua is easy to write bindings for, it is simple and lightweight and more importantly we already have Lua included in the Scylla binary as it is used as the implementation language for UDF/UDA. We might consider WASM support in the future, but for now we don't have any language support in WASM available. Example: ```lua function new_stats(key) return { partition_key = key, total = 0, partition = 0, static_row = 0, clustering_row = 0, range_tombstone_change = 0, }; end total_stats = new_stats(nil); function inc_stat(stats, field) stats[field] = stats[field] + 1; stats.total = stats.total + 1; total_stats[field] = total_stats[field] + 1; total_stats.total = total_stats.total + 1; end function on_new_sstable(sst) max_partition_stats = new_stats(nil); if sst then current_sst_filename = sst.filename; else current_sst_filename = nil; end end function consume_partition_start(ps) current_partition_stats = new_stats(ps.key); inc_stat(current_partition_stats, "partition"); end function consume_static_row(sr) inc_stat(current_partition_stats, "static_row"); end function consume_clustering_row(cr) inc_stat(current_partition_stats, "clustering_row"); end function consume_range_tombstone_change(crt) inc_stat(current_partition_stats, "range_tombstone_change"); end function consume_partition_end() if current_partition_stats.total > max_partition_stats.total then max_partition_stats = current_partition_stats; end end function on_end_of_sstable() if current_sst_filename then print(string.format("Stats for sstable %s:", current_sst_filename)); else print("Stats for stream:"); end print(string.format("\t%d fragments in %d partitions - %d static rows, %d clustering rows and %d range tombstone changes", total_stats.total, total_stats.partition, total_stats.static_row, total_stats.clustering_row, total_stats.range_tombstone_change)); print(string.format("\tPartition with max number of fragments (%d): %s - %d static rows, %d clustering rows and %d range tombstone changes", max_partition_stats.total, max_partition_stats.partition_key, max_partition_stats.static_row, max_partition_stats.clustering_row, max_partition_stats.range_tombstone_change)); end ``` Running this script wilt yield the following: ``` $ scylla sstable script --script-file fragment-stats.lua --system-schema system_schema.columns /var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/me-1-big-Data.db Stats for sstable /var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f//me-1-big-Data.db: 397 fragments in 7 partitions - 0 static rows, 362 clustering rows and 28 range tombstone changes Partition with max number of fragments (180): system - 0 static rows, 179 clustering rows and 0 range tombstone changes ``` Fixes: https://github.com/scylladb/scylladb/issues/9679 Closes #11649 * github.com:scylladb/scylladb: tools/scylla-sstable: consume_reader(): improve pause heuristincs test/cql-pytest/test_tools.py: add test for scylla-sstable script tools: add scylla-sstable-scripts directory tools/scylla-sstable: remove custom operation tools/scylla-sstable: add script operation tools/sstable: introduce the Lua sstable consumer dht/i_partitioner.hh: ring_position_ext: add weight() accessor lang/lua: export Scylla <-> lua type conversion methods lang/lua: use correct lib name for string lib lang/lua: fix type in aligned_used_data (meant to be user_data) lang/lua: use lua_State* in Scylla type <-> Lua type conversions tools/sstable_consumer: more consistent method naming tools/scylla-sstable: extract sstable_consumer interface into own header tools/json_writer: add accessor to underlying writer tools/scylla-sstable: fix indentation tools/scylla-sstable: export mutation_fragment_json_writer declaration tools/scylla-sstable: mutation_fragment_json_writer un-implement sstable_consumer tools/scylla-sstable: extract json writing logic from json_dumper tools/scylla-sstable: extract json_writer into its own header tools/scylla-sstable: use json_writer::DataKey() to write all keys tools/scylla-types: fix use-after-free on main lambda captures	2023-01-09 20:54:42 +02:00
Raphael S. Carvalho	05ffb024bb	replica: Kill table::calculate_shard_from_sstable_generation() Inferring shard from generation is long gone. We still use it in some scripts, but that's no longer needed in Scylla, when loading the SSTables, and it also conflicts with ongoing work of UUID-based generations. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12476	2023-01-09 20:17:57 +02:00
Takuya ASADA	548c9e36a1	main: add tcp_timestamps sanity check Check net.ipv4.tcp_timestamps, show warning message when it's not set to 1. Fixes #12144 Closes #12199	2023-01-09 19:08:21 +02:00
Nadav Har'El	d6e6820f33	Merge 'Drop support for cql binary protocols versions 1 and 2' from Avi Kivity The CQL binary protocol version 3 was introduced in 2014. All Scylla version support it, and Cassandra versions 2.1 and newer. Versions 1 and 2 have 16-bit collection sizes, while protocol 3 and newer use 32-bit collection sizes. Unfortunately, we implemented support for multiple serialization formats very intrusively, by pushing the format everywhere. This avoids the need to re-serialize (sometimes) but is quite obnoxious. It's also likely to be broken, since it's almost untested and it's too easy to write cql_serialization_format::internal() instead of propagating the client specified value. Since protocols 1 and 2 are obsolete for 9 years, just drop them. It's easy to verify that they are no longer in use on a running system by examining the `system.clients` table before upgrade. Fixes #10607 Closes #12432 * github.com:scylladb/scylladb: treewide: drop cql_serialization_format cql: modification_statement: drop protocol check for LWT transport: drop cql protocol versions 1 and 2	2023-01-09 18:52:41 +02:00
Botond Dénes	bd42da6e69	tools/scylla-sstable: consume_reader(): improve pause heuristincs The consume loop had some heuristics in place to determine whether after pausing, the consumer wishes to skip just the partition or the remaining content of the sstable. This heuristics was flawed so replace it with a non-heuristic method: track the last consumed fragment and look at this to determine what should be done.	2023-01-09 09:46:57 -05:00
Botond Dénes	1d222220e0	test/cql-pytest/test_tools.py: add test for scylla-sstable script To test the script operation, we use some of the example scripts from the example directory. Namely, dump.lua and slice.lua. These two scripts together have a very good coverage of the entire script API. Testing their functionality therefore also provides a good coverage of the lua bindings. A further advantage is that since both scripts dump output in identical format to that of the data-dump operation, it is trivial to do a comparison against this already tested operation. A targeted test is written for the sstable skip functionality of the consumer API.	2023-01-09 09:46:57 -05:00
Botond Dénes	ace42202df	tools: add scylla-sstable-scripts directory To be the home of example scripts for scylla-sstable. For now only a README.md is added describing the directory's purpose and with links to useful resources. One example script is added in this patch, more will come later.	2023-01-09 09:46:57 -05:00
Botond Dénes	7b40463f29	tools/scylla-sstable: remove custom operation We now have a script operation, the custom operation (poor man's script operation) has no reason to exist anymore.	2023-01-09 09:46:57 -05:00
Botond Dénes	e5071fdeab	tools/scylla-sstable: add script operation Loads the script from the specified path, then feeds the mutation fragment stream to it. For now only Lua scripts are supported for the simple reason that Lua is easy to write bindings for, it is simple and lightweight and more importantly we already have Lua included in the Scylla binary as it is used as the implementation language for UDF/UDA. We might consider WASM support in the future, but for now we don't have any language support in WASM available.	2023-01-09 09:46:57 -05:00
Botond Dénes	9dd5107919	tools/sstable: introduce the Lua sstable consumer The Lua sstable consumer loads a script from the specified path then feeds the mutation fragment stream to the script via the sstable_consumer methods, each method of which the script is allowed to define, effectively overloading the virtual method in Lua. This allows for very wide and flexible customization opportunities for what to extract from sstables and how to process and present them, without the need to recompile the scylla-sstable tool.	2023-01-09 09:46:57 -05:00
Botond Dénes	50b155e706	dht/i_partitioner.hh: ring_position_ext: add weight() accessor	2023-01-09 09:46:57 -05:00
Botond Dénes	8699fe5001	lang/lua: export Scylla <-> lua type conversion methods Currently hidden in lang/lua.cc, declare these in a header so others can use it.	2023-01-09 09:46:57 -05:00
Botond Dénes	e9a52837cf	lang/lua: use correct lib name for string lib AFAIK the mistake had no real consequence, but still it is nicer to have it correct.	2023-01-09 09:46:57 -05:00
Botond Dénes	76663d7774	lang/lua: fix type in aligned_used_data (meant to be user_data)	2023-01-09 09:46:57 -05:00
Botond Dénes	943fc3b6f3	lang/lua: use lua_State* in Scylla type <-> Lua type conversions Instead of the lua_slice_state which is local to this file. We want to reuse the Scylla type <-> Lua type conversion functions but for that they have to use the more generic lua_State*. No functionality or convenience is lost with the switch, the code didn't make use of the other fields bundled in lua_slice_state.	2023-01-09 09:46:57 -05:00
Botond Dénes	8045751867	tools/sstable_consumer: more consistent method naming Use `consume_` consistently across the entire interface, instead of having some methods with `on_` and others with `consume_` prefixes.	2023-01-09 09:46:57 -05:00
Botond Dénes	8e117501ac	tools/scylla-sstable: extract sstable_consumer interface into own header So it can be used in code outside scylla-sstable.cc. This source file is quite large already, and as we have yet another large chunk of code to add, we want to add it in a separate file.	2023-01-09 09:46:57 -05:00
Botond Dénes	9b1c486051	tools/json_writer: add accessor to underlying writer	2023-01-09 09:46:57 -05:00
Botond Dénes	cfb5afbe9b	tools/scylla-sstable: fix indentation Left broken by previous patches.	2023-01-09 09:46:57 -05:00
Botond Dénes	d42b0bb5d5	tools/scylla-sstable: export mutation_fragment_json_writer declaration To json_writer.hh. Method definition are left in scylla-sstable.cc. Indentation is left broken, will be fixed by the next patch.	2023-01-09 09:46:57 -05:00
Botond Dénes	517135e155	tools/scylla-sstable: mutation_fragment_json_writer un-implement sstable_consumer There is no point in the former implementing said interface. For one it is a futurized interface, which is not needed for something writing to the stdout. Rename the methods to follow the naming convention of rjson writers more closely.	2023-01-09 09:46:57 -05:00
Botond Dénes	0ee1c6ca57	tools/scylla-sstable: extract json writing logic from json_dumper We want to split this class into two parts: one with the actual logic converting mutation fragments to json, and a wrapper over this one, which implements the sstable_consumer interface. As a first step we extract the class as is (no changes) and just forward all-calls from now empty wrapper to it.	2023-01-09 09:46:57 -05:00
Botond Dénes	55ef0ed421	tools/scylla-sstable: extract json_writer into its own header Other source files will want to use it soon.	2023-01-09 09:46:57 -05:00
Botond Dénes	8623818a8d	tools/scylla-sstable: use json_writer::DataKey() to write all keys This method was renamed from its previous name of PartitionKey. Since in json partition keys and clustering keys look alike, with the only difference being that the former may also have a token, it makes to have a single method to write them (with an optional token parameter). This was the case at some point, json_dumper::write_key() taking this role. However at a later point, json_writer::PartitionKey() was introduced and now the code uses both. Standardize on the latter and give it a more generic name.	2023-01-09 09:46:57 -05:00
Botond Dénes	602fca0a12	tools/scylla-types: fix use-after-free on main lambda captures The main lambda of scylla-types, the one passed to app_template::run() was recently made a coroytine. app_template::run() however doesn't keep this lambda alive and hence after the first suspention point, accessing the lambda's captures triggers use-after-free. The simple fix is to convert the coroutine into continuation chain.	2023-01-09 09:46:57 -05:00
Tomasz Grabiec	f97268d8f2	row_cache: Fix violation of the "oldest version are evicted first" when evicting last dummy Consider the following MVCC state of a partition: v2: ==== <7> [entry2] ==== <9> ===== <last dummy> v1: ================================ <last dummy> [entry1] Where === means a continuous range and --- means a discontinuous range. After two LRU items are evicted (entry1 and entry2), we will end up with: v2: ---------------------- <9> ===== <last dummy> v1: ================================ <last dummy> [entry1] This will cause readers to incorrectly think there are no rows before entry <9>, because the range is continuous in v1, and continuity of a snapshot is a union of continuous intervals in all versions. The cursor will see the interval before <9> as continuous and the reader will produce no rows. This is only temporary, because current MVCC merging rules are such that the flag on the latest entry wins, so we'll end up with this once v1 is no longer needed: v2: ---------------------- <9> ===== <last dummy> ...and the reader will go to sstables to fetch the evicted rows before entry <9>, as expected. The bug is in rows_entry::on_evicted(), which treats the last dummy entry in a special way, and doesn't evict it, and doesn't clear the continuity by omission. The situation is not easy to trigger because it requires certain eviction pattern concurrent with multiple reads of the same partition in different versions, so across memtable flushes. Closes #12452	2023-01-09 16:10:52 +02:00
Avi Kivity	1bb1855757	Merge 'replica/database: fix read related metrics' from Botond Dénes Sstable read related metrics are broken for a long time now. First, the introduction of inactive reads (https://github.com/scylladb/scylladb/issues/1865) diluted this metric, as it now also contained inactive reads (contrary to the metric's name). Then, after moving the semaphore in front of the cache (`3d816b7c1`) this metric became completely broken as this metric now contains all kinds of reads: disk, in-memory and inactive ones too. This series aims to remedy this: * `scylla_database_active_reads` is fixed to only include active reads. * `scylla_database_active_reads_memory_consumption` is renamed to `scylla_database_reads_memory_consumption` and its description is brought up-to-date. * `scylla_database_disk_reads` is added to track current reads that are gone to disk. * `scylla_database_sstables_read` is added to track the number of sstables read currently. Fixes: https://github.com/scylladb/scylladb/issues/10065 Closes #12437 * github.com:scylladb/scylladb: replica/database: add disk_reads and sstables_read metrics sstables: wire in the reader_permit's sstable read count tracking reader_concurrency_semaphore: add disk_reads and sstables_read stats replica/database: fix active_reads_memory_consumption_metric replica/database: fix active_reads metric	2023-01-09 12:18:49 +02:00
Pavel Emelyanov	e20738cd7d	azure_snitch: Handle empty zone returned from IMDS Azure metadata API may return empty zone sometimes. If that happens shard-0 gets empty string as its rack, but propagates UNKNOWN_RACK to other shards. Empty zones response should be handled regardless. refs: #12185 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12274	2023-01-09 11:57:45 +02:00
Nadav Har'El	2d845b6244	test/cql-pytest: a test for more than one equality in WHERE Cassandra refuses a request with more than one equality relation to the same column, for example DELETE FROM tbl WHERE partitionKey = ? AND partitionKey = ? It complains that partitionkey cannot be restricted by more than one relation if it includes an Equal Currently, Scylla doesn't consider such requests an error. Whether or not we should be compatible with Cassandra here is discussed in issue #12472. But as long as we do accept this query, we should be sure we do the right thing: "WHERE p = 1 AND p = 2" should match nothing (not the first, or last, value being tested..), and "WHERE p = 1 AND p = 1" should match the matches of p = 1. This patch adds a test for verify that these requests indeed yield correct results. The test is scylla_only because, as explained above, Cassandra doesn't support this feature at all. Refs #12472 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12473	2023-01-09 11:56:39 +02:00
Anna Stuchlik	b61515c871	doc: replace Scylla with ScyllaDB on the menu tree and major links; related: https://github.com/scylladb/scylla-docs/issues/3962 Closes #12456	2023-01-09 08:39:50 +02:00
Avi Kivity	42575340ba	Update seastar submodule * seastar ca586cfb8d...8889cbc198 (14): > http: request_parser: fix grammar ambiguity in field_content Fixes #12468 > sstring: use fold expression to simply copy_str_to() > sstring: use fold expression to simply str_len() > metrics: capture by move in make_function() > metrics: replace homebrew is_callable<> with is_invocable_v<> > reactor: use std::move() to avoid copy. > reactor: remove redundant semicolon. > reactor: use mutable to make std::move() work. > build: install liburing explicitly on ArchLinux. > reactor: use a for loop for submitting ios > metrics: add spaces around '=' > parallel utils: align concept with implementation > reactor: s/resize(0)/clear()/ > reactor: fix a typo in comment Closes #12469	2023-01-08 18:56:00 +02:00
Alejo Sanchez	d632e1aa7a	test/pytest: add missing import, remove unused import Add missed import time and remove unused name import. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #12446	2023-01-08 17:38:46 +02:00
Avi Kivity	5ffe4fee6d	Merge 'Remove legacy half reverse' from Michał Radwański This commit removes consume_in_reverse::legacy_half_reverse, an option once used to indicate that the given key ranges are sorted descending, based on the clustering key of the start of the range, and that the range tombstones inside partition would be sorted (descending, as all the mutation fragments would) according to their end (but range tombstone would still be stored according to their start bound). As it turns out, mutation::consume, when called with legacy_half_reverse option produces invalid fragment stream, one where all the row tombstone changes come after all the clustering rows. This was not an issue, since when constructing results from the query, Scylla would not pass the tombstones to the client, but instead compact data beforehand. In this commit, the consume_in_reverse::legacy_half_reverse is removed, along with all the uses. As for the swap out in mutation_partition.cc in query_mutation and to_data_query_result: The downstream was not prepared to deal with legacy_half_reverse. mutation::consume contains ``` if (reverse == consume_in_reverse::yes) { while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::yes>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) { co_await yield(); } } else { while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::no>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) { co_await yield(); } } ``` So why did it work at all? to_data_query_result deals with a single slice. The used consumer (compact_for_query_v2) compacts-away the range tombstone changes, and thus the only difference between the consume_in_reverse::no and consume_in_reverse::yes was that one was ordered increasing wrt. ckeys and the second one was ordered decreasing. This property is maintained if we swap out for the consume_in_reverse::yes format. Refs: #12353 Closes #12453 * github.com:scylladb/scylladb: mutation{,_consumer,_partition}: remove consume_in_reverse::legacy_half_reverse mutation_partition_view: treat query::partition_slice::option::reversed in to_data_query_result as consume_in_reverse::yes mutation: move consume_in_reverse def to mutation_consumer.hh	2023-01-08 15:42:00 +02:00
Botond Dénes	c4688563e3	sstables: track decompressed buffers Convert decompressed temporary buffers into tracked buffers just before returning them to the upper layer. This ensures these buffers are known to the reader concurrency semaphore and it has an accurate view of the actual memory consumption of reads. Fixes: #12448 Closes #12454	2023-01-08 15:34:28 +02:00
Kamil Braun	b77df84543	test: test_topology: make test_nodes_with_different_smp less hacky The test would use a trick to start a separate Scylla cluster from the one provided originally by the test framework. This is not supported by the test framework and may cause unexpected problems. Change the test to perform regular node operations. Instead of starting a fresh cluster of 3 nodes, we join the first of these nodes to the original framework-provided cluster, then decommission the original nodes, then bootstrap the other 2 fresh nodes. Also add some logging to the test. Refs: #12438, #12442 Closes #12457	2023-01-08 15:33:17 +02:00
Avi Kivity	02c9968e73	Merge 'Add WASM UDF implementation in Rust' from Wojciech Mitros This series adds the implementation and usage of rust wasmtime bindings. The WASM UDFs introduced by this patch are interruptable and use memory allocated using the seastar allocator. This series includes #11102 (the first two commits) because #11102 required disabling wasm UDFs completely. This patch disables them in the middle of the series, and enables them again at the end. After this patch, `libwasmtime.a` can be removed from the toolchain. This patch also removes the workaround for #https://github.com/scylladb/scylladb/issues/9387 but it hasn't been tested with ARM yet - if the ARM test causes issues I'll revert this part of the change. Closes #11351 * github.com:scylladb/scylladb: build: remove references to unused c bindings of wasmtime test: assert that WASM allocations can fail without crashing wasm: limit memory allocated using mmap wasm: add configuration options for instance cache and udf execution test: check that wasmtime functions yield wasm: use the new rust bindings of wasmtime rust: add Wasmtime bindings rust: add build profiles more aligned with ninja modes rust: adjust build according to cxxbridge's recommendations tools: toolchain: dbuild: prepare for sharing cargo cache	2023-01-08 15:31:09 +02:00
Nadav Har'El	f5cda3cfc3	test/cql-pytest: add more tests for "timestamp" column type In issue #3668, a discussion spanning several years theorized that several things are wrong with the "timestamp" type. This patch begins by adding several tests that demonstrate that Scylla is in fact behaving correctly, and mostly identically to Cassandra except one esoteric error handling case. However, after eliminating the red herrings, we are left for the real issue that prompted opening #3668, which is a duplicate of issues #2693 and #2694, and this patch also adds a reproducer for that. The issue is that Cassandra 4 added support for arithmetic expressions on values, and timestamps can be added durations, for example: '2011-02-03 04:05:12.345+0000' - 1d is a valid timestamp - and we don't currently support this syntax. So the new test - which passes on Cassandra 4 and fails on Scylla (or Cassandra 3) is marked xfail. Refs #2693 Refs #2694 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12436	2023-01-08 15:00:49 +02:00
Michał Chojnowski	08b3a9c786	configure: don't reduce parsers' optimization level to 1 in release The line modified in this patch was supposed to increase the optimization levels of parsers in debug mode to 1, because they were too slow otherwise. But as a side effect, it also reduced the optimization level in release mode to 1. This is not a problem for the CQL frontend, because statement preparation is not performance-sensitive, but it is a serious performance problem for Alternator, where it lies in the hot path. Fix this by only applying the -O1 to debug modes. Fixes #12463 Closes #12460	2023-01-06 18:04:36 +02:00
Wojciech Mitros	903c4874d0	build: remove references to unused c bindings of wasmtime Before the changes intorducing the new wasmtime bindings we relied on an downloaded static library libwasmtime.a. Now that the bindings are introduced, we do not rely on it anymore, so all references to it can be removed.	2023-01-06 14:07:29 +01:00
Wojciech Mitros	996a942e05	test: assert that WASM allocations can fail without crashing The main source of big allocations in the WASM UDF implementation is the WASM Linear Memory. We do not want Scylla to crash even if a memory allocation for the WASM Memory fails, so we assert that an exception is thrown instead. The wasmtime runtime does not actually fail on an allocation failure (assuming the memory allocator does not abort and returns nullptr instead - which our seastar allocator does). What happens then depends on the failed allocation handling of the code that was compiled to WASM. If the original code threw an exception or aborted, the resulting WASM code will trap. To make sure that we can handle the trap, we need to allow wasmtime to handle SIGILL signals, because that what is used to carry information about WASM traps. The new test uses a special WASM Memory allocator that fails after n allocations, and the allocations include both memory growth instructions in WASM, as well as growing memory manually using the wasmtime API. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2023-01-06 14:07:29 +01:00
Wojciech Mitros	f05d612da8	wasm: limit memory allocated using mmap The wasmtime runtime allocates memory for the executable code of the WASM programs using mmap and not the seastar allocator. As a result, the memory that Scylla actually uses becomes not only the memory preallocated for the seastar allocator but the sum of that and the memory allocated for executable codes by the WASM runtime. To keep limiting the memory used by Scylla, we measure how much memory do the WASM programs use and if they use too much, compiled WASM UDFs (modules) that are currently not in use are evicted to make room. To evict a module it is required to evict all instances of this module (the underlying implementation of modules and instances uses shared pointers to the executable code). For this reason, we add reference counts to modules. Each instance using a module is a reference. When an instance is destroyed, a reference is removed. If all references to a module are removed, the executable code for this module is deallocated. The eviction of a module is actually acheved by eviction of all its references. When we want to free memory for a new module we repeatedly evict instances from the wasm_instance_cache using its LRU strategy until some module loses all its instances. This process may not succeed if the instances currently in use (so not in the cache) use too much memory - in this case the query also fails. Otherwise the new module is added to the tracking system. This strategy may evict some instances unnecessarily, but evicting modules should not happen frequently, and any more efficient solution requires an even bigger intervention into the code.	2023-01-06 14:07:29 +01:00
Wojciech Mitros	b8d28a95bf	wasm: add configuration options for instance cache and udf execution Different users may require different limits for their UDFs. This patch allows them to configure the size of their cache of wasm, the maximum size of indivitual instances stored in the cache, the time after which the instances are evicted, the fuel that all wasm UDFs are allowed to consume before yielding (for the control of latency), the fuel that wasm UDFs are allowed to consume in total (to allow performing longer computations in the UDF without detecting an infinite loop) and the hard limit of the size of UDFs that are executed (to avoid large allocations)	2023-01-06 14:07:27 +01:00
Wojciech Mitros	3214f5c2db	test: check that wasmtime functions yield The new implementation for WASM UDFs allows executing the UDFs in pieces. This commit adds a test asserting that the UDF is in fact divided and that each of the execution segments takes no longer than 1ms.	2023-01-06 14:05:53 +01:00
Wojciech Mitros	3146807192	wasm: use the new rust bindings of wasmtime This patch replaces all dependencies on the wasmtime C++ bindings with our new ones. The wasmtime.hh and wasm_engine.hh files are deleted. The libwasmtime.a library is no longer required by configure.py. The SCYLLA_ENABLE_WASMTIME macro is removed and wasm udfs are now compiled by default on all architectures. In terms of implementation, most of code using wasmtime was moved to the Rust source files. The remaining code uses names from the new bindings (which are mostly unchanged). Most of wasmtime objects are now stored as a rust::Box<>, to make it compatible with rust lifetime requirements. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2023-01-06 14:05:53 +01:00
Wojciech Mitros	50b24cf036	rust: add Wasmtime bindings The C++ bindings provided by wasmtime are lacking a crucial capability: asynchronous execution of the wasm functions. This forces us to stop the execution of the function after a short time to prevent increasing the latency. Fortunately, this feature is implemented in the native language of Wasmtime - Rust. Support for Rust was recently added to scylla, so we can implement the async bindings ourselves, which is done in this patch. The bindings expose all the objects necessary for creating and calling wasm functions. The majority of code implemented in Rust is a translation of code that was previously present in C++. Types exported from Rust are currently required to be defined by the same crate that contains the bridge using them, so wasmtime types can't be exported directly. Instead, for each class that was supposed to be exported, a wrapper type is created, where its first member is the wasmtime class. Note that the members are not visible from C++ anyway, the difference only applies to Rust code. Aside from wasmtime types and methods, two additional types are exported with some associated methods. - The first one is ValVec, which is a wrapper for a rust Vec of wasmtime Vals. The underlying vector is required by wasmtime methods for calling wasm functions. By having it exported we avoid multiple conversions from a Val wrapper to a wasmtime Val, as would be required if we exported a rust Vec of Val wrappers (the rust Vec itself does not require wrappers if the type it contains is already wrapped) - The second one is Fut. This class represents an computation tha may or may not be ready. We're currently using it to control the execution of wasm functions from C++. This class exposes one method: resume(), which returns a bool that signals whether the computation is finished or not. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2023-01-06 14:05:53 +01:00
Wojciech Mitros	33c97de25c	rust: add build profiles more aligned with ninja modes A cargo profile is created for each of build modes: dev, debug, sanitize, realease and coverage. The names of cargo profiles are prefixed by "rust-" because cargo does not allow separate "dev" and "debug" profiles. The main difference between profiles are their optimization levels, they correlate to the levels used in configure.py. The debug info is stripped only in the dev mode, and only this mode uses "incremental" compilation to speed it up.	2023-01-06 14:05:53 +01:00
Wojciech Mitros	4d7858e66d	rust: adjust build according to cxxbridge's recommendations Currently, the rust build system in Scylla creates a separate static library for each incuded rust package. This could cause duplicate symbol issues when linking against multiple libraries compiled from rust. This issue is fixed in this patch by creating a single static library to link against, which combines all rust packages implemented in Scylla. The Cargo.lock for the combined build is now tracked, so that all users of the same scylla version also use the same versions of imported rust modules. Additionally, the rust package implementation and usage docs are modified to be compatible with the build changes. This patch also adds a new header file 'rust/cxx.hh' that contains definitions of additional rust types available in c++.	2023-01-06 14:05:53 +01:00
Avi Kivity	eeaa475de9	tools: toolchain: dbuild: prepare for sharing cargo cache Rust's cargo caches downloaded sources in ~/.cargo. However dbuild won't provide access to this directory since it's outside the source directory. Prepare for sharing the cargo cache between the host and the dbuild environment by: - Creating the cache if it doesn't already exist. This is likely if the user only builds in a dbuild environment. - Propagating the cache directory as a mounted volume. - Respecting the CARGO_HOME override.	2023-01-06 14:05:53 +01:00
Avi Kivity	6868dcf30b	tools: toolchain: drop s390x from prepare script architecture list It's been a long while since we built ScyllaDB for s390x, and in fact the last time I checked it was broken on the ragel parser generator generating bad source files for the HTTP parser. So just drop it from the list. I kept s390x in the architecture mapping table since it's still valid. Closes #12455	2023-01-06 09:08:01 +02:00
Michał Radwański	1fbf433966	mutation{,_consumer,_partition}: remove consume_in_reverse::legacy_half_reverse This commit removes consume_in_reverse::legacy_half_reverse, an option once used to indicate that the given key ranges are sorted descending, based on the clustering key of the start of the range, and that the range tombstones inside partition would be sorted (descending, as all the mutation fragments would) according to their end (but range tombstone would still be stored according to their start bound). As it turns out, mutation::consume, when called with legacy_half_reverse option produces invalid fragment stream, one where all the row tombstone changes come after all the clustering rows. This was not an issue, since when constructing results from the query, Scylla would not pass the tombstones to the client, but instead compact data beforehand. In this commit, the consume_in_reverse::legacy_half_reverse is removed, along with all the uses. As for the swap out in mutation_partition.cc in query_mutation and to_data_query_result: The downstream was not prepared to deal with legacy_half_reverse. mutation::consume contains ``` if (reverse == consume_in_reverse::yes) { while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::yes>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) { co_await yield(); } } else { while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::no>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) { co_await yield(); } } ``` So why did it work at all? to_data_query_result deals with a single slice. The used consumer (compact_for_query_v2) compacts-away the range tombstone changes, and thus the only difference between the consume_in_reverse::no and consume_in_reverse::yes was that one was ordered increasing wrt. ckeys and the second one was ordered decreasing. This property is maintained if we swap out for the consume_in_reverse::yes format.	2023-01-05 18:48:55 +01:00
Botond Dénes	2612f98a6c	Merge 'Abort repair tasks' from Aleksandra Martyniuk Aborting of repair operation is fully managed by task manager. Repair tasks are aborted: - on shutdown; top level repair tasks subscribe to global abort source. On shutdown all tasks are aborted recursively - through node operations (applies to data_sync_repair_task_impls and their descendants only); data_sync_repair_task_impl subscribes to node_ops_info abort source - with task manager api (top level tasks are abortable) - with storage_service api and on failure; these cases were modified to be aborted the same way as the ones from above are. Closes #12085 * github.com:scylladb/scylladb: repair: make top level repair tasks abortable repair: unify a way of aborting repair operations repair: delete sharded abort source from node_ops_info repair: delete unused node_ops_info from data_sync_repair_task_impl repair: delete redundant abort subscription from shard_repair_task_impl repair: add abort subscription to data sync task tasks: abort tasks on system shutdown	2023-01-05 15:21:35 +01:00
Avi Kivity	cc6010b512	Merge 'Make restore_replica_count abortable' from Benny Halevy Similar to the way we allow aborting streaming-based removenode, subscribe to storage_service::_abort_source to request abort locally and pass a shared_ptr<abort_source> to `node_ops_info`, used to abort removenode_with_repair on shutdown. Fixes #12429 Closes #12430 * github.com:scylladb/scylladb: storage_service: restore_replica_count: demote status_checker related logging to debug level storage_service: restore_replica_count: allow aborting removenode_with_repair storage_service: coroutinize restore_replica_count storage_service: restore_replica_count: undefer stop_status_checker storage_service: restore_replica_count: handle exceptions from stream_async and send_replication_notification storage_service: restore_replica_count: coroutinize status_checker	2023-01-05 15:21:35 +01:00
Kamil Braun	09da661eeb	Merge 'raft: replace experimental raft option with dedicated flag' from Gleb Natapov Unlike other experimental feature we want to raft to be opt in even after it leaves experimental mode. For that we need to have a separate option to enable it. The patch adds the binary option "consistent-cluster-management" for that. * 'consistent-cluster-management-flag' of github.com:scylladb/scylla-dev: raft: replace experimental raft option with dedicated flag main: move supervisor notification about group registry start where it actually starts	2023-01-05 15:21:35 +01:00
Anna Stuchlik	44e6f18d1b	doc: add the new upgrade guide to the toctree and fix its name	2023-01-05 14:13:33 +01:00
Anna Stuchlik	0ad2e3e63a	docs: add the upgrade guide from ScyllaDB 5.1 to ScyllaDB Enterprise 2022.2	2023-01-05 13:30:10 +01:00
Aleksandra Martyniuk	dcb91457da	api: change retrieve_status signature Sometimes we may need task status to be nothrow move constructible. httpd::task_manager_json::task_status does not satisfy this requirement. retrieve_status returns future<full_task_status> instead of future<task_status> to provide an intermediate struct with better properties. An argument is passed by reference to prevent the necessity to copy foreign_ptr.	2023-01-05 13:28:51 +01:00
Kamil Braun	df72536fc5	Merge 'docs: add the upgrade guide for Enterprise from 2022.1 to 2022.2' from Anna Stuchlik Fixes https://github.com/scylladb/scylladb/issues/12314 This PR adds the upgrade guide for ScyllaDB Enterprise - from version 2022.1 to 2022.2. Using this opportunity, I've replaced "Scylla" with "ScyllaDB" in the upgrade-enterprise index file. In previous releases, we added several upgrade guides - one per platform (and version). In this PR, I've merged the information for different platforms to create one generic upgrade guide. It is similar to what @kbr- added for the Open Source upgrade guide from 5.0 to 5.1. See https://docs.scylladb.com/stable/upgrade/upgrade-opensource/upgrade-guide-from-5.0-to-5.1/. Closes #12339 * github.com:scylladb/scylladb: docs: add the info about minor release docs: add the new upgade guide 2022.1 to 2022.2 to the index and the toctree docs: add the index file for the new upgrage guide from 2022.1 to 2022.2 docs: add the metrics update file to the upgrade guide 2022.1 to 2022.2 docs: add the upgrade guide for ScyllaDB Enterprise from 2022.1 to 2022.2	2023-01-04 18:07:00 +01:00
Benny Halevy	086546f575	storage_service: restore_replica_count: demote status_checker related logging to debug level the status_checker is not the main line of business of restore_replica_count, starting and stopping it do nt seem to deserve info level logging, which might have been useful in the past to debug issues surrounding that. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-04 19:05:04 +02:00
Benny Halevy	3879ee1db8	storage_service: restore_replica_count: allow aborting removenode_with_repair Similar to the way we allow aborting streaming-based removenode, subscribe to storage_service::_abort_source to request abort locally and pass a shared_ptr<abort_source> to `node_ops_info`, used to abort removenode_with_repair on shutdown. Fixes #12429 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-04 19:05:04 +02:00
Benny Halevy	afece5bdc4	storage_service: coroutinize restore_replica_count and unwrap the async thread started for streaming. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-04 19:05:04 +02:00
Benny Halevy	d1eadc39c1	storage_service: restore_replica_count: undefer stop_status_checker Now that all exceptions in the rest of the function are swallowed, just execute the stop_status_checker deferred action serially before returning, on the wau to coroutinizing restore_replica_count (since we can't co_await status_checker inside the deferred action). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-04 19:05:04 +02:00
Benny Halevy	788ecb738d	storage_service: restore_replica_count: handle exceptions from stream_async and send_replication_notification On the way to coroutinizing restore_replica_count, extract awaiting stream_async and send_replication_notification into a try/catch blocks so we can later undefer stop_status_checker. The exception is still returned as an exceptional future which is logged by the caller as warning. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-04 19:02:42 +02:00
Benny Halevy	b54d121dfd	storage_service: restore_replica_count: coroutinize status_checker There is no need to start a thread for the status_checker and can be implemented using a background coroutine. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-01-04 19:02:20 +02:00
Botond Dénes	1d273a98b9	readers/multishard: shard_reader::close() silence read-ahead timeouts Timouts are benign, especially on a read-ahead that turned out to be not needed at all. They just introduce noise in the logs, so silence them. Fixes: #12435 Closes #12441	2023-01-04 16:10:09 +02:00
Anna Stuchlik	9216b657c8	doc: fix the version in the comment on removing the note	2023-01-04 14:01:33 +01:00
Kamil Braun	4268b1bbc2	Merge 'raft: raft_group0, register RPC verbs on all shards' from Gusev Petr raft_group0 used to register RPC verbs only on shard 0. This worked on clusters with the same --smp setting on all nodes, since RPCs in this case are processed on the same shard as the calling code, and raft_group0 methods only run on shard 0. A new test test_nodes_with_different_smp was added to identify the problem. Since --smp can only be specified via the command line, a corresponding parameter was added to the ManagerClient.server_add method. It allows to override the default parameters set by the SCYLLA_CMDLINE_OPTIONS variable by changing, adding or deleting individual items. Fixes: #12252 Closes #12374 * github.com:scylladb/scylladb: raft: raft_group0, register RPC verbs on all shards raft: raft_append_entries, copy entries to the target shard test.py, allow to specify the node's command line in test	2023-01-04 11:11:21 +01:00
Marcin Maliszkiewicz	61a9816bad	utils/rjson: enable inlining in rapidjson library Due to lack of NDEBUG macro inlining was disabled. It's important for parsing and printing performance. Testing with perf_simple_query shows that it reduced around 7000 insns/op, thus increasing median tps by 4.2% for the alternator frontend. Because inlined functions are called for every character in json this scales with request/response size. When default write size is increased by around 7x (from ~180 to ~ 1255 bytes) then the median tps increased by 12%. Running: ./build/release/test/perf/perf_simple_query_g --smp 1 \ --alternator forbid --default-log-level error \ --random-seed=1235000092 --duration=60 --write Results before the patch: median 46011.50 tps (197.1 allocs/op, 12.1 tasks/op, 170989 insns/op, 0 errors) median absolute deviation: 296.05 maximum: 46548.07 minimum: 42955.49 Results after the patch: median 47974.79 tps (197.1 allocs/op, 12.1 tasks/op, 163723 insns/op, 0 errors) median absolute deviation: 303.06 maximum: 48517.53 minimum: 44083.74 The change affects both json parsing and printing. Closes #12440	2023-01-04 10:27:35 +02:00
Michał Jadwiszczak	83bb77b8bb	test/boost/cql_query_test: enable `parallelized_aggregation` Run tests for parallelized aggregation with `enable_parallelized_aggregation` set always to true, so the tests work even if the default value of the option is false. Closes #12409	2023-01-04 10:11:25 +02:00
Anna Stuchlik	c4d779e447	doc: Fix https://github.com/scylladb/scylla-doc-issues/issues/854 - update the procedure to update topology strategy when nodes are on different racks Closes #12439	2023-01-04 09:50:10 +02:00
Avi Kivity	2739ac66ed	treewide: drop cql_serialization_format Now that we don't accept cql protocol version 1 or 2, we can drop cql_serialization format everywhere, except when in the IDL (since it's part of the inter-node protocol). A few functions had duplicate versions, one with and one without a cql_serialization_format parameter. They are deduplicated. Care is taken that `partition_slice`, which communicates the cql_serialization_format across nodes, still presents a valid cql_serialization_format to other nodes when transmitting itself and rejects protocol 1 and 2 serialization\ format when receiving. The IDL is unchanged. One test checking the 16-bit serialization format is removed.	2023-01-03 19:54:13 +02:00
Avi Kivity	654b96660a	cql: modification_statement: drop protocol check for LWT CQL protocol 1 did not support LWT, but since we don't support it any more, we can drop the check and the supporting get_protocol_version() helper.	2023-01-03 19:51:57 +02:00
Avi Kivity	424dbf43f3	transport: drop cql protocol versions 1 and 2 Version 3 was introduced in 2014 (Cassandra 2.1) and was supported in the very first version of Scylla (`2a7da21481` "CQL binary protocol"). Cassandra 3.0 (2015) dropped protocols 1 and 2 as well. It's safe enough to drop it now, 9 years after introduction of v3 and 7 years after Cassandra stopped supporting it. Dropping it allows dropping cql_serialization_format, which causes quite a lot of pain, and is probably broken. This will be dropped in the following patch.	2023-01-03 19:47:49 +02:00
Avi Kivity	f600ad5c1b	Update seastar submodule * seastar 3db15b5681...ca586cfb8d (28): > reactor: trim returned buffer to received number of bytes > util/process: include used header > build: drop unused target_include_directories() > build: use BUILD_IN_SOURCE instead chdir <SOURCE_DIR> > build: specify CMake policy CMP0135 to new > tests: only destroy allocated pending connections > build: silence the output when generating private keys > tests, httpd: Limit loopback connection factory sharding > lw_shared_ptr: Add nullptr_t comparing operators > noncopyable_function: Add concept for (Func func) constructor > reactor: add process::terminate() and process::kill() > Merge 'tests, include: include headers without ".." in path' from Kefu Chai > build: customize toolset for building Boost > build: use different toolset base on specified compiler > allocator: add an option to reserve additional memory for the OS > Merge 'build: pass cflags and ldflags to cooking.sh' from Kefu Chai > build: build static library of cryptopp > gate: add gate holders debugging > build: detect debug build of yaml-cpp also > build: do not use pkg_search_module(IMPORTED_TARGET) for finding yaml-cpp > build: bump yaml-cpp to 0.7.0 in cooking_recipe > build: bump cryptopp to 8.7.0 in cooking_recipe > build: bump boost to 1.81.0 in cooking_recipe > build: bump fmtlib to 9.1.0 in cooking_recipe > shared_ptr: add overloads for fmt::ptr() > chunked_fifo: const_iterator: use the base class ctor > build: s/URING_LIBARIES/URING_LIBRARIES/ > build: export the full path of uring with URING_LIBRARIES Closes #12434	2023-01-03 17:58:31 +02:00
Alejo Sanchez	889acf710c	test/python: increase CQL connection timeout for... test_ssl In very slow debug builds the default driver timeouts are too low and tests might fail. Bump up the values to a more reasonable time. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #12408	2023-01-03 17:10:46 +02:00
Nadav Har'El	1c96d2134f	docs,alternator: link to issue about missing ACL feature The alternator compatibility.md document mentions the missing ACL (access control) feature, but unlike other missing features we forgot to link to the open issue about this missing feature. So let's add that link. Refs #5047. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12399	2023-01-03 16:50:33 +02:00
Kamil Braun	fc57626afa	Merge 'docs: remove auto_bootstrap option from the documentation' from Anna Stuchlik Fixes https://github.com/scylladb/scylladb/issues/12318 This PR removes all occurrences of the `auto_bootstrap` option in the docs. In most cases, I've simply removed the option name and its definition, but sometimes additional changes were necessary: - In node-joined-without-any-data.rst, I removed the `auto_bootstrap `option as one of the causes of the problem. - In rebuild-node.rst, I removed the first step in the procedure (enabling the `auto_bootstrap `option). - In admin. rst, I removed the section about manual bootstrapping - it's based on setting `auto_bootstrap` to false, which is not possible now. Closes #12419 * github.com:scylladb/scylladb: docs: remove the auto_bootstrap option from the admin procedures - involves removing the Manual Bootstraping section docs: remove the auto_bootstrap option from the procedure to replace a dead node docs: remove the auto_bootstrap option from the Troubleshooting article about a node joining with no data docs: remove the auto_bootstrap option from the procedure to rebuild a node after losing the data volume docs: remove the auto_bootstrap option from the procedures to create a cluster or add a DC	2023-01-03 15:44:00 +01:00
Botond Dénes	e4d5b2a373	replica/database: add disk_reads and sstables_read metrics Tracking the current number of reads gone to disk and the current number of sstables read by all such reads respectively.	2023-01-03 09:37:29 -05:00
Botond Dénes	2acfa950d7	sstables: wire in the reader_permit's sstable read count tracking Hook in the relevant methods when creating and destroying sstable readers.	2023-01-03 09:37:29 -05:00
Botond Dénes	2c0de50969	reader_concurrency_semaphore: add disk_reads and sstables_read stats And the infrastructure to reader_permit to update them. The infrastructure is not wired in yet. These metrics will be used to count the number of reads gone to disk and the number of sstables read currently respectively.	2023-01-03 09:37:29 -05:00
Botond Dénes	dcd2deb5af	replica/database: fix active_reads_memory_consumption_metric Rename to reads_memory_consumption and drop the "active" from the description as well. This metric tracks the memory consumption of all reads: active or inactive. We don't even currently have a way to track the memory consumption of only active reads. Drop the part of the description which explains the interaction with other metrics: this part is outdated and the new interactions are much more complicated, no way to explain in a metric description. Also ask the semaphore to calculate the memory amount, instead of doing it in the metric itself.	2023-01-03 09:25:47 -05:00
Petr Gusev	8417840647	raft: raft_group0, register RPC verbs on all shards raft_group0 used to register RPC verbs only on shard 0. This worked on clusters with the same --smp setting on all nodes, since RPCs in this case are (usually) processed on the same shard as the calling code, and raft_group0 methods only run on shard 0. A new test test_nodes_with_different_smp was added to identify the problem. Fixes: #12252	2023-01-03 17:04:07 +03:00
Anna Stuchlik	00ef20c3df	docs: remove the auto_bootstrap option from the admin procedures - involves removing the Manual Bootstraping section	2023-01-03 14:48:01 +01:00
Anna Stuchlik	b7d62b2fc7	docs: remove the auto_bootstrap option from the procedure to replace a dead node	2023-01-03 14:47:55 +01:00
Anna Stuchlik	bc62e61df1	docs: remove the auto_bootstrap option from the Troubleshooting article about a node joining with no data	2023-01-03 14:46:38 +01:00
Anna Stuchlik	1602f27cd7	docs: remove the auto_bootstrap option from the procedure to rebuild a node after losing the data volume	2023-01-03 14:45:08 +01:00
Botond Dénes	929481ea9c	replica/database: fix active_reads metric This metric has been broken for a long time, since inactive reads were introduced. As calculated currently, it includes all permits that passed admission, including inactive reads. On the other hand, it excludes permits created bypassing admission. Fix by using the newly introduced (in this patch) reader_concurrency_semaphore::active_reads() as the basis of this metric: this now includes all permits (reads) that are currently active, excluding waiters and inactive reads.	2023-01-03 08:12:25 -05:00
Petr Gusev	7725e03a09	raft: raft_append_entries, copy entries to the target shard If append_entries RPC was received on a non-zero shard, we may need to pass it to a zero (or, potentially, some other) shard. The problem is that raft::append_request contains entries in the form of raft::log_entry_ptr == lw_shared_ptr<log_entry>, which doesn't support cross-shard reference counting. In debug mode it contains a special ref-counting facility debug_shared_ptr_counter_type, which resorts to on_internal_error if it detects such a case. To solve this, we just copy log entries to the target shard if it isn't equal to the current one. In most cases, if --smp setting is the same on all nodes, RPC will be handled on zero shard, so there will be no overhead.	2023-01-03 15:25:00 +03:00
Petr Gusev	1c23390f12	test.py, allow to specify the node's command line in test An optional parameter cmdline has been added to the ManagerClient.server_add method. It allows you to override the default parameters set by the SCYLLA_CMDLINE_OPTIONS variable by changing, adding or deleting individual items. To change or add a parameter just specify its name and value one after the other. To remove parameter use the special keyword __remove__ as a value. To set a parameter without a value (such as --overprovisioned) use the special keyword __missing__ as the value.	2023-01-03 15:24:54 +03:00
Nadav Har'El	eb85f136c8	cql-pytest: document how to write new cql-pytest tests Add to test/cql-pytest/README.md an explanation of the philosophy of the cql-pytest test suite, and some guideliness on how to write good tests in that framework. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12400	2023-01-03 12:13:22 +02:00
Anna Stuchlik	994bc33147	docs: fix the command on the Manager-Monitoring Integration troubleshooting page Closes #12375	2023-01-03 11:41:16 +02:00
Anna Stuchlik	9d17d812c0	docs: Fix https://github.com/scylladb/scylla-doc-issues/issues/870 , update the nodetool rebuild command Closes #12416	2023-01-03 11:40:40 +02:00
Gleb Natapov	1688163233	raft: replace experimental raft option with dedicated flag Unlike other experimental feature we want to raft to be optional even after it leaves experimental mode. For that we need to have a separate option to enable it. The patch adds the binary option "consistent-cluster-management" for that.	2023-01-03 11:15:11 +02:00
Gleb Natapov	29060cc235	main: move supervisor notification about group registry start where it actually starts `99fe580068` moved raft_group_registry::start call a bit later, but forget to move supervisor notification call. Do it now.	2023-01-03 11:09:30 +02:00
Botond Dénes	2ef71e9c70	Merge 'Improve verbosity of task manager api' from Aleksandra Martyniuk The PR introduces changes to task manager api: - extends tasks' list returned with get_tasks with task type, keyspace, table, entity, and sequence number - extends status returned with get_task_status and wait_task with a list of children's ids Closes #12338 * github.com:scylladb/scylladb: api: extend status in task manager api api: extend get_tasks in task manager api	2023-01-03 10:39:41 +02:00
Botond Dénes	82101b786d	Merge 'docs: document scylla-api-client' from Anna Stuchlik Fixes https://github.com/scylladb/scylladb/issues/11999. This PR adds a description of scylla-api-cli. Closes #12392 * github.com:scylladb/scylladb: docs: fix the description of the system log POST example docs: uptate the curl tool name docs: describe how to use the scylla-api-client tool docs: fix the scylla-api-client tool name docs: document scylla-api-cli	2023-01-03 10:30:04 +02:00
Benny Halevy	63c2cdafe8	sstables: index_reader: close(index_bound&) reset current_list When closing _lower_bound and *_upper_bound in the final close() call, they are currently left with an engaged current_list member. If the index_reader uses a _local_index_cache, it is evicted with evict_gently which will, rightfully, see the respective pages as referenced, and they won't be evicted gently (only later when the index_reader is destroyed). Reset index_bound.current_list on close(index_bound&) to free up the reference. Ref #12271 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12370	2023-01-02 16:42:33 +01:00
Avi Kivity	767b7be8be	Merge 'Get rid of handle_state_replacing' from Benny Halevy Since [repair: Always use run_replace_ops](`2ec1f719de`), nodes no longer publish HIBERNATE state so we don't need to support handling it. Replace is now always done using node operations (using repair or streaming). so nodes are never expected to change status to HIBERNATE. Therefore storage_service:handle_state_replacing is not needed anymore. This series gets rid of it and updates documentation related to STATUS:HIBERNATE respectively. Fixes #12330 Closes #12349 * github.com:scylladb/scylladb: docs: replace-dead-node: get rid of hibernate status storage_service: get rid of handle_state_replacing	2023-01-02 13:35:29 +02:00
Gleb Natapov	28952d32ff	storage_service: move leave_ring outside of unbootstrap() We want to reuse the later without the call. Message-Id: <20221228144944.3299711-17-gleb@scylladb.com>	2023-01-02 12:03:29 +02:00
Gleb Natapov	229cef136d	raft: add trace logging to raft::server::start Allows to see initial state of the server during start. Message-Id: <20221228144944.3299711-15-gleb@scylladb.com>	2023-01-02 11:57:53 +02:00
Gleb Natapov	96453ff75f	service: raft: improve group0_state_machine::apply logging Trace how many entries are applied as well. Message-Id: <20221228144944.3299711-14-gleb@scylladb.com>	2023-01-02 11:57:16 +02:00
Gleb Natapov	dbd5b97201	storage_service: improve logging in update_pending_ranges() function We pass the reason for the change. Log it as well. Message-Id: <20221228144944.3299711-11-gleb@scylladb.com>	2023-01-02 11:54:03 +02:00
Gleb Natapov	04ab673359	messaging: check that a node knows its own topology before accessing it We already check is remote's node topology is missing before creating a connection, but local node topology can be missing too when we will use raft to manage it. Raft needs to be able to create connections before topology is knows. Message-Id: <20221228144944.3299711-7-gleb@scylladb.com>	2023-01-02 11:53:14 +02:00
Gleb Natapov	6f104982e1	topology: use std::erase_if on std::map instead of ad-hoc loop There is std::erase_if since c++20. We can use it here. Message-Id: <20221228144944.3299711-6-gleb@scylladb.com>	2023-01-02 11:45:52 +02:00
Gleb Natapov	84eb5924ac	system_keyspace: remove redundant include storage_proxy.hh is included twice Message-Id: <20221228144944.3299711-4-gleb@scylladb.com>	2023-01-02 11:39:22 +02:00
Gleb Natapov	5182543df2	raft: fix typo in read_barrier logging The log logs applied index not append one. Message-Id: <20221228144944.3299711-3-gleb@scylladb.com>	2023-01-02 11:38:47 +02:00
Gleb Natapov	5a96751534	storage_service: remove start_leaving since it is no longer used Message-Id: <20221228144944.3299711-2-gleb@scylladb.com>	2023-01-02 11:37:48 +02:00
Raphael S. Carvalho	b4e4bbd64a	database_test: Reduce x_log2_compaction_group values to avoid timeout database_test in timing out because it's having to run the tests calling do_with_cql_env_and_compaction_groups 3x, one for each compaction group setting. reduce it to 2 settings instead of 3 if running in debug mode. Refs #12396. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12421	2023-01-01 13:56:18 +02:00
Raphael S. Carvalho' via ScyllaDB development	a7c4a129cb	sstables: Bump row_reads metrics for mx version Metric was always 0 despite a row was processed by mx reader. Fixes #12406. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20221227220202.295790-1-raphaelsc@scylladb.com>	2022-12-30 18:38:30 +01:00
Anna Stuchlik	601aeb924a	docs: remove the auto_bootstrap option from the procedures to create a cluster or add a DC	2022-12-30 13:10:06 +01:00
Anna Stuchlik	705b347d36	doc: extend the information about the recommended RF on the Tracing page	2022-12-30 11:30:20 +01:00
Avi Kivity	8635d24424	build: drop abseil submodule, replace with distribution abseil This lets us carry fewer things and rely on the distribution for maintenance. The frozen toolchain is updated. Incidental updates include clang 15.0.6, and pytest that doesn't need workarounds. Closes #12397	2022-12-28 19:02:23 +02:00
Avi Kivity	eced91b575	Revert "view: coroutinize maybe_mark_view_as_built" This reverts commit `ac2e2f8883`. It causes a regression ("std::bad_variant_access in load_view_build_progress"). Commit `2978052113` (a reindent) is also reverted as part of the process. Fixes #12395	2022-12-28 15:36:05 +02:00
Anna Stuchlik	6d70665185	doc: extend the information on removing an unavailable node	2022-12-28 13:19:58 +01:00
Anna Stuchlik	f95c6423c1	docs: extend the warning on the Remove a Node page	2022-12-28 13:16:36 +01:00
Nadav Har'El	200bc82913	test/cql-pytest: exit immediately if Scylla is down In commit `acfa180766` we added to test/cql-pytest a mechanism to detect when Scylla crashes in the middle of a test function - in which case we report the culprit test and exit immediately to avoid having a hundred more tests report that they failed as well just because Scylla was down. However, if Scylla was never up - e.g., if the user ran "pytest" without ever running Scylla - we still report hundreds of tests as having failed, which is confusing and not helpful. So with this patch, if a connection cannot be made to Scylla at all, the test exits immediately, explaining what went wrong, not blaming any specific test: $ pytest ... ! _pytest.outcomes.Exit: Cannot connect to Scylla at --host=localhost --port=9042 ! ============================ no tests ran in 0.55s ============================= Beyond being a helpful reminder for a developer who runs "pytest" without having started Scylla first (or using test/cql-pytest/run or test.py to start Scylla easily), this patch is also important when running tests through test.py if it reuses an instance of Scylla that crashed during an earlier pytest file's run. This patch does not fix test.py - it can still try to run pytest with a dead Scylla server without checking. But at least with this patch pytest will notice this problem immediately and won't report hundreds of test functions having failed. The only report the user will see will be the last test which crashed Scylla, which will make it easier to find this failure without being hidden between hundreds of spurious failures. Fixes #12360 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12401	2022-12-28 13:04:28 +02:00
Anna Stuchlik	d0db1a27c3	docs: fix the description of the system log POST example	2022-12-28 11:25:54 +01:00
Anna Stuchlik	b7ec99b10b	docs: uptate the curl tool name	2022-12-28 10:33:07 +01:00
Asias He	b9e5e340aa	streaming: Enable offstrategy for all classic streaming based node ops This patch enables offstrategy compaction for all classic streaming based node ops. We can use this method because tables are streamed one after another. As long as there is still streamed data for a given table, we update the automatic trigger timer. When all the streaming has finished, the trigger timer will timeout and fire the offstrategy compaction for the given table. I checked with this patch, rebuild is 3X faster. There was no compaction in the middle of the streaming. The streamed sstables are compacted together after streaming is done. Time Before: INFO 2022-11-25 10:06:08,213 [shard 0] range_streamer - Rebuild succeeded, took 67 seconds, nr_ranges_remaining=0 Time After: INFO 2022-11-25 09:42:50,943 [shard 0] range_streamer - Rebuild succeeded, took 23 seconds, nr_ranges_remaining=0 Compaciton Before: 88 sstables were written -> 88 sstables were added into main set Compaction After: 88 sstables written -> after offstretegy 2 sstables were added into main seet Closes #11848	2022-12-28 11:12:02 +02:00
Michał Chojnowski	5e79d6b30b	tasks: task_manager: move invoke_on_task<> to .hh invoke_on_task is used in translation units where its definition is not visible, yet it has no explicit instantiations. If the compiler always decides to inline the definition, not to instantiate it implicitly, linking invoke_on_task will fail. (It happened to me when I turned up inline-threshold). Fix that. Closes #12387	2022-12-28 10:55:43 +02:00
Alejo Sanchez	d408b711e3	test/python: increase CQL connection timeouts In very slow debug builds the default driver timeouts are too low and tests might fail. Bump up the values to more reasonable time. These timeout values are the same as used in topology tests. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #12405	2022-12-28 10:06:33 +02:00
Anna Stuchlik	39ade2f5a5	docs: describe how to use the scylla-api-client tool	2022-12-27 14:46:16 +01:00
Anna Stuchlik	2789501023	docs: fix the scylla-api-client tool name	2022-12-27 14:28:27 +01:00
Alejo Sanchez	1bfe234133	test/pylib: API get/set logger level of Scylla server Provide helpers to get and set logger level for Scylla servers. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #12394	2022-12-25 13:58:43 +02:00
Anna Stuchlik	ea7e23bf92	docs: fix the option name from compaction to compression on the Data Definition page Fixes the option name in the "Other table options" table on the Data Definition page. Fixes #12334 Closes #12382	2022-12-25 11:24:56 +02:00
Botond Dénes	b0d95948e1	mutation_compactor: reset stop flag on page start When the mutation compactor has all the rows it needs for a page, it saves the decision to stop in a member flag: _stop. For single partition queries, the mutation compactor is kept alive across pages and so it has a method, start_new_page() to reset its state for the next page. This method didn't clear the _stop flag. This meant that the value set at the end of the previous could cause the new page and subsequently the entire query to be stopped prematurely. This can happen if the new page starts with a row that is covered by a higher level tombstone and is completely empty after compaction. Reset the _stop flag in start_new_page() to prevent this. This commit also adds a unit test which reproduces the bug. Fixes: #12361 Closes #12384	2022-12-24 13:52:45 +02:00
Takuya ASADA	642d035067	docker: prevent hostname -i failure when server address is specified On some docker instance configuration, hostname resolution does not work, so our script will fail on startup because we use hostname -i to construct cqlshrc. To prevent the error, we can use --rpc-address or --listen-address for the address since it should be same. Fixes #12011 Closes #12115	2022-12-24 13:52:16 +02:00
Asias He	d819d98e78	storage_service: Ignore dropped table for repair_updater In case a table is dropped, we should ignore it in the repair_updater, since we can not update off strategy trigger for a dropped table. Refs #12373 Closes #12388	2022-12-24 13:48:25 +02:00
Raphael S. Carvalho	67ebd70e6e	compaction_manager: Fix reactor stalls during periodic submissions Every 1 hour, compaction manager will submit all registered table_state for a regular compaction attempt, all without yielding. This can potentially cause a reactor stall if there are 1000s of table states, as compaction strategy heuristics will run on behalf of each, and processing all buckets and picking the best one is not cheap. This problem can be magnified with compaction groups, as each group is represented by a table state. This might appear in dashboard as periodic stalls, every 1h, misleading the investigator into believing that the problem is caused by a chronological job. This is fixed by piggybacking on compaction reevaluation loop which can yield between each submission attempt if needed. Fixes #12390. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12391	2022-12-24 13:43:16 +02:00
Anna Stuchlik	74fd776751	docs: document scylla-api-cli	2022-12-23 11:27:37 +01:00
Benny Halevy	8797958dfc	schema: operator<<: print also tombstone_gc_options They are currently missing from the printout when the a table is created, but they are determinal to understanding the mode with which tombstones are to be garbage-collected in the table. gcGraceSeconds alone is no longer enough since the introduction of tombstone_gc_option in `a8ad385ecd`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12381	2022-12-22 16:40:18 +02:00
Anna Stuchlik	7e8977bf2d	docs: add the info about minor release	2022-12-22 10:26:33 +01:00
Nadav Har'El	ef2e5675ed	materialized views, test: add tests for CLUSTERING ORDER BY In issue #10767, concerned were raised that the CLUSTERING ORDER BY clause is handled incorrectly in a CREATE MATERIALIZED VIEW definition. The tests in this patch try to explore the different ways in which CLUSTERING ORDER BY can be used in CREATE MATERIALIZED VIEW and allows us to compare Scylla's behaivor to Cassandra, and to common sense. The tests discover that the CLUSTERING ORDER BY feature in materialized views generally works as expected, but there are three differences between Scylla and Cassandra in this feature. We consider two differences to be bugs (and hence the test is marked xfail) and one a Scylla extension: 1. When a base table has a reverse-order clustering column and this clustering column is used in the materialized view, in Cassandra the view's clustering order inherits the reversed order. In Scylla, the view's clustering order reverts to the default order. Arguably, both behaviors can be justified, but usually when in doubt we should implement Cassandra's behavior - not pick a different behavior, even if the different behavior is also reasonable. So this test (test_mv_inherit_clustering_order()) is marked "xfail", and a new issue was created about this difference: #12308. If we want to fix this behavior to match Cassandra's we should also consider backward compatibility - what happens if we change this behavior in Scylla now, after we had the opposite behavior in previous releases? We may choose to enshrine Scylla's Cassandra- incompatible behavior here - and document this difference. 2. The CLUSTERING ORDER BY should, as its name suggests, only list clustering columns. In Scylla, specifying other things, like regular columns, partition-key columns, or non-existent columns, is silently ignored, whereas it should result in an Invalid Request error (as it does in Cassandra). So test_mv_override_clustering_order_error() is marked "xfail". This is the difference already discovered in #10767. 3. When a materialized view has several clustering columns, Cassandra requires that a CLUSTERING ORDER BY clause, if present, must specify the order of all of all clustering columns. Scylla, in contrast, allows the user to override the order of only some of these columns - and the rest get the default order. I consider this to be a legitimate Scylla extension, and not a compatibility bug, so marked the test with "scylla_only", and no issue was opened about it. Refs #10767 Refs #12308 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12307	2022-12-22 09:48:16 +02:00
Nadav Har'El	6d2e146aa6	test/cql-pytest.py: add scylla_inject_error() utility This patch adds a scylla_inject_error(), a context manager which tests can use to temporarily enable some error injection while some test code is running. It can be used to write tests that artificially inject certain errors instead of trying to reach the elaborate (and often requiring precise timing or high amounts of data) situation where they occur naturally. The error-injection API is Scylla-specific (it uses the Scylla REST API) and does not work on "release"-mode builds (all other modes are supported), so when Cassandra or release-mode build are being tested, the test which uses scylla_inject_error() gets skipped. Example usage: ```python from rest_api import scylla_inject_error with scylla_inject_error(cql, "injection_name", one_shot=True): # do something here ... ``` Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12264	2022-12-22 09:39:10 +02:00
Nadav Har'El	01f0644b22	Merge 'scylla-gdb.py: introduce `scylla get-config-value`' from Botond Dénes Retrieves the configuration item with the given name and prints its value as well as its metadata. Example: (gdb) scylla get-config-value compaction_static_shares value: 100, type: "float", source: SettingsFile, status: Used, live: MustRestart Closes #12362 * github.com:scylladb/scylladb: scylla-gdb.py: add scylla get-config-value gdb command scylla-gdb.py: extract $downcast_vptr logic to standalone method test: scylla-gdb/run: improve diagnostics for failed tests	2022-12-21 18:38:23 +02:00
Aleksandra Martyniuk	599fce16cf	repair: make top level repair tasks abortable	2022-12-21 11:52:58 +01:00
Aleksandra Martyniuk	e77de463e4	repair: unify a way of aborting repair operations	2022-12-21 11:52:53 +01:00
Aleksandra Martyniuk	f56e886127	repair: delete sharded abort source from node_ops_info Sharded abort source in node_ops_info is no longer needed since its functionality is provided by task manager's tasks structure.	2022-12-21 11:37:03 +01:00
Aleksandra Martyniuk	18efe0a4e8	repair: delete unused node_ops_info from data_sync_repair_task_impl	2022-12-21 11:28:30 +01:00
Aleksandra Martyniuk	ee13a5dde8	api: extend status in task manager api Status of tasks returned with get_task_status and wait_task is extended with the list of ids of child tasks.	2022-12-21 10:54:56 +01:00
Aleksandra Martyniuk	697af4ccf2	api: extend get_tasks in task manager api Each task stats in a list returned from tm::get_task api call is extended with info about: task type, keyspace, table, entity, and sequence number.	2022-12-21 10:54:50 +01:00
Michał Chojnowski	19049150ef	configure.py: remove --static, --pie, --so These options have been nonsense since 2017. --pie and --so are ignored, --static disables (sic!) static linking of libraries. Remove them. Closes #12366	2022-12-21 11:01:56 +02:00
Botond Dénes	29d49e829e	scylla-gdb.py: add scylla get-config-value gdb command Retrieves the configuration item with the given name and prints its value as well as its metadata. Example: (gdb) scylla get-config-value compaction_static_shares value: 100, type: "float", source: SettingsFile, status: Used, live: MustRestart	2022-12-21 03:05:56 -05:00
Botond Dénes	0cdb89868a	scylla-gdb.py: extract $downcast_vptr logic to standalone method So it can be reused by regular python code.	2022-12-21 03:05:56 -05:00
Botond Dénes	24022c19a6	test: scylla-gdb/run: improve diagnostics for failed tests By instructing gdb to print the full python stack in case of errors.	2022-12-21 03:05:56 -05:00
Michał Chojnowski	d9269abf5b	sstables: index_reader: always evict the local cache gently Due to an oversight, the local index cache isn't evicted gently when _upper_bound existed. This is a source of reactor stalls. Fix that. Fixes #12271 Closes #12364	2022-12-20 18:23:27 +02:00
Michał Radwański	e7fbcd6c9d	mutation_partition_view: treat query::partition_slice::option::reversed in to_data_query_result as consume_in_reverse::yes The consume_in_reverse::legacy_half_reverse format is soon to be phased out. This commit starts treating frozen_mutations from replicas for reversed queries so that they are consumed with consume_in_reverse::yes.	2022-12-20 17:05:02 +01:00
Benny Halevy	1adb2bff18	mutation: move consume_in_reverse def to mutation_consumer.hh To be used also by frozen_mutation consumer. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-20 16:23:10 +01:00
Avi Kivity	bb731b4f52	Merge 'docs: move documentation of tools online' from Botond Dénes Currently the scylla tools (`scylla-types` and `scylla-sstable`) have documentation in two places: high level documentation can be found at `docs/operating-scylla/admin-tools/scylla-{types,sstable}.rst`, while low level, more detailed documentation is embedded in the tool itself. This is especially pronounced for `scylla-sstable`, which only has a short description of its operations online, all details being found only in the command-line help. We want to move away from this model, such that all documentation can be found online, with the command-line help being reserved to documenting how the various switches and flags work, on top of a short description of the operation and a link to the detailed online docs. Closes #12284 * github.com:scylladb/scylladb: tool/scylla-sstable: move documentation online docs: scylla-sstable.rst: add sstable content section docs: scylla-{sstable,types}.rst: drop Syntax section	2022-12-20 17:04:47 +02:00
Avi Kivity	3fce43124a	Merge 'Static compaction groups' from Raphael "Raph" Carvalho Allows static configuration of number of compaction groups per table per shard. To bootstrap the project, config option x_log2_compaction_groups was added which controls both number of groups and partitioning within a shard. With a value of 0 (default), it means 1 compaction group, therefore all tokens go there. With a value of 3, it means 8 compaction groups, and 3 most-significant-bits of tokens being used to decide which group owns the token. And so on. It's still missing: - integration with repair / streaming - integration with reshard / reshape. perf/perf_simple_query --smp 1 --memory 1G BEFORE ----- median 61358.55 tps ( 71.1 allocs/op, 12.2 tasks/op, 56375 insns/op, 0 errors) median 61322.80 tps ( 71.1 allocs/op, 12.2 tasks/op, 56391 insns/op, 0 errors) median 61058.58 tps ( 71.1 allocs/op, 12.2 tasks/op, 56386 insns/op, 0 errors) median 61040.94 tps ( 71.1 allocs/op, 12.2 tasks/op, 56381 insns/op, 0 errors) median 61118.40 tps ( 71.1 allocs/op, 12.2 tasks/op, 56379 insns/op, 0 errors) AFTER ----- median 61656.12 tps ( 71.1 allocs/op, 12.2 tasks/op, 56486 insns/op, 0 errors) median 61483.29 tps ( 71.1 allocs/op, 12.2 tasks/op, 56495 insns/op, 0 errors) median 61638.05 tps ( 71.1 allocs/op, 12.2 tasks/op, 56494 insns/op, 0 errors) median 61726.09 tps ( 71.1 allocs/op, 12.2 tasks/op, 56509 insns/op, 0 errors) median 61537.55 tps ( 71.1 allocs/op, 12.2 tasks/op, 56491 insns/op, 0 errors) Closes #12139 * github.com:scylladb/scylladb: test: mutation_test: Test multiple compaction groups test: database_test: Test multiple compaction groups test: database_test: Adapt it to compaction groups db: Add config for setting static number of compaction groups replica: Introduce static compaction groups test: sstable_test: Stop referencing single compaction group api: compaction_manager: Stop a compaction type for all groups api: Estimate pending tasks on all compaction groups api: storage_service: Run maintenance compactions on all compaction groups replica: table: Adapt assertion to compaction groups replica: database: stop and disable compaction on behalf of all groups replica: Introduce table::parallel_foreach_table_state() replica: disable auto compaction on behalf of all groups replica: table: Rework compaction triggers for compaction groups replica: Adapt table::get_sstables_including_compacted_undeleted() to compaction groups replica: Adapt table::rebuild_statistics() to compaction groups replica: table: Perform major compaction on behalf of all groups replica: table: Perform off-strategy compaction on behalf of all groups replica: table: Perform cleanup compaction on behalf of all groups replica: Extend table::discard_sstables() to operate on all compaction groups replica: table: Create compound sstable set for all groups replica: table: Set compaction strategy on behalf of all groups replica: table: Return min memtable timestamp across all groups replica: Adapt table::stop() to compaction groups replica: Adapt table::clear() to compaction groups replica: Adapt table::can_flush() to compaction groups replica: Adapt table::flush() to compaction groups replica: Introduce parallel_foreach_compaction_group() replica: Adapt table::set_schema() to compaction groups replica: Add memtables from all compaction groups for reads replica: Add memtable_count() method to compaction_group replica: table: Reserve reader list capacity through a callback replica: Extract addition of memtables to reader list into a new function replica: Adapt table::occupancy() to compaction groups replica: Adapt table::active_memtable() to compaction groups replica: Introduce table::compaction_groups() replica: Preparation for multiple compaction groups scylla-gdb: Fix backward compatibility of scylla_memtables command	2022-12-20 17:04:47 +02:00
Avi Kivity	623be22d25	Merge 'sstables: allow bypassing min max position metadata loading' from Botond Dénes Said mechanism broke tools and tests to some extent: the read it executes on sstable load time means that if the sstable is broken enough to fail this read, it will fail to load, preventing diagnostic tools to load it and examine it and preventing tests from producing broken sstables for testing purposes. Closes #12359 * github.com:scylladb/scylladb: sstables: allow bypassing first/last position metadata loading sstables: sstable::{load,open_data}(): fix indentation sstables: coroutinize sstable::open_data() sstables: sstable::open_data(): use clear_gently() to clear token ranges sstables: coroutinize sstable::load()	2022-12-20 17:04:47 +02:00
Aleksandra Martyniuk	60e298fda1	repair: change utils::UUID to node_ops_id Type of the id of node operations is changed from utils::UUID to node_ops_id. This way the id of node operations would be easily distinguished from the ids of other entities. Closes #11673	2022-12-20 17:04:47 +02:00
Avi Kivity	88a1fbd72f	Update seastar submodule * seastar 3a5db04197...3db15b5681 (27): > build: get the full path of c-ares > build: unbreak pkgconfig output > http: Add 206 Partial Content response code > http: Carry integer content_length on reply > tls_test: drop duplicated includes > tls_test: remove duplicated test case > reactor: define __NR_pidfd_open if not defined > sockets: Wait on socket peer closing the connection > tcp: Close connection when getting RST from server > Merge 'Enhance rpc tester with delays, timeouts and verbosity' from Pavel Emelyanov > Merge 'build: use pkg_search_module(.. IMPORTED_TARGET ..) ' from Kefu Chai > build: define GnuTLS_{LIBRARIES, INCLUDE_DIRS} only if GnuTLS is found > build: use pkg_search_module(.. IMPORTED_TARGET ..) > addr2line: extend asan regex > abort_source: move-assign operator: call base class unlink > coroutine: correct syntax error in doxygen comment > demo: Extend http connection demo with https > test: temporarily disable warning for tests triggering warnings > tests/unit/coroutine: Include <ranges> > sstring: Document why sstring exists at all > test: log error when read/write to pipe fails > test: use executables in /bin > tests: spawn_test: use BOOST_CHECK_EQUAL() for checking equality of temporary_buffer > docker: bump up to clang {14,15} and gcc {11,12} > shared_ptr: ignore false alarm from GCC-12 > build: check for fix of CWG2631 > circleci: use versioned container image Closes #12355	2022-12-20 17:04:47 +02:00
Botond Dénes	3c8949d34c	sstables: allow bypassing first/last position metadata loading When loading an sstable. Tests and tools might want to do this to be able to load a damaged sstable to do tests/diagnostics on it.	2022-12-20 01:45:38 -05:00
Botond Dénes	bba956c13c	sstables: sstable::{load,open_data}(): fix indentation	2022-12-20 01:45:38 -05:00
Botond Dénes	c85ff7945d	sstables: coroutinize sstable::open_data() Used once when sstable is opened on startup, not performance sensitive.	2022-12-20 01:45:38 -05:00
Botond Dénes	15966a0b1b	sstables: sstable::open_data(): use clear_gently() to clear token ranges Instead of an open-coded loop. It also makes the code easier to coroutinize (next patch).	2022-12-20 01:45:22 -05:00
Nadav Har'El	08c8e0d282	test/alternator: enable tests for long strings of consecutive tombstones In the past we had issue #7933 where very long strings of consecutive tombstones caused Alternator's paging to take an unbounded amount of time and/or memory for a single page. This issue was fixed (by commit `e9cbc9ee85`) but the two tests we had reproducing that issue were left with the "xfail" mark. They were also marked "veryslow" - each taking about 100 seconds - so they didn't run by default so nobody noticed they started to pass. In this patch I make these tests much faster (taking less than a second together), confirm that they pass - and remove the "xfail" mark and improve their descriptions. The trick to making these tests faster is to not create a million tombstones like we used to: We now know that after string of just 10,000 tombstones ('query_tombstone_page_limit') the page should end, so we can check specifically this number. The story is more complicated for partition tombstones, but there too it should be a multiple of query_tombstone_page_limit. To make the tests even faster, we change run.py to lower the query_tombstone_page_limit from the default 10,000 to 1000. The tests work correctly even without this change, but they are ten times faster with it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12350	2022-12-20 07:08:36 +02:00
Botond Dénes	94f3fb341f	Merge 'Fix nix devenv' from Michael Livshin * Update Nixpkgs base * Clarify some comments * Get rid of custom-packaged cxxbridge (it's now present in Nixpkgs as cxx-rs) * Add missing libraries (libdeflate, libxcrypt) * Fix expected hash of the gdb patch * Fix a couple of small build problems Fixes #12259 Closes #12346 * github.com:scylladb/scylladb: build: fix Nix devenv cql3: mark several private fields as maybe_unused configure.py: link with more abseil libs	2022-12-20 07:01:06 +02:00
Michael Livshin	7c383c6249	build: fix Nix devenv * Update Nixpkgs base * Clarify some comments * Get rid of custom-packaged cxxbridge (it's now present in Nixpkgs as cxx-rs) * Add missing libraries (libdeflate, libxcrypt) * Fix expected hash of the gdb patch * Bump Python driver to 3.25.20-scylla Fixes #12259	2022-12-19 20:53:07 +02:00
Michael Livshin	4407828766	cql3: mark several private fields as maybe_unused Because they are indeed unused -- they are initialized, passed down through some layers, but not actually used. No idea why only Clang 12 in debug mode in Nix devenv complains about it, though.	2022-12-19 20:53:07 +02:00
Michael Livshin	c0c8afb79e	configure.py: link with more abseil libs Specifically libabsl_strings{,_internal}.a. This fixes failure to link tests in the Nix devenv; since presumably all is good in other setups, it must be something weird having to do with inlining? The extra linked libraries shouldn't hurt in any case.	2022-12-19 20:53:07 +02:00
Raphael S. Carvalho	e7380bea65	test: mutation_test: Test multiple compaction groups Extends mutation_test to run the tests with more than one compaction group, in addition to a single one (default). Piggyback on existing tests. Avoids duplication. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 12:36:07 -03:00
Raphael S. Carvalho	e3e7c3c7e5	test: database_test: Test multiple compaction groups Extends database_test to run the tests with more than one compaction group, in addition to a single one (default). Piggyback on existing tests. Avoids duplication. Caught a bug when snapshotting, in implementation of table::can_flush(), showing its usefulness. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 12:36:07 -03:00
Raphael S. Carvalho	e103e41c76	test: database_test: Adapt it to compaction groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 12:36:05 -03:00
Aleksandra Martyniuk	be529cc209	repair: delete redundant abort subscription from shard_repair_task_impl data_sync_repair_task_impl subscribes to corresponding node_ops_info abort source and then, when requested, all its descedants are aborted recursively. Thus, shard_repair_task_impl does not need to subscribe to the node_ops_info abort source, since the parent task will take care of aborting once it is requested. abort_subscription and connected attributes are deleted from the shard_repair_task_impl.	2022-12-19 16:07:28 +01:00
Aleksandra Martyniuk	e48ca62390	repair: add abort subscription to data sync task When node operation is aborted, same should happen with the corresponding task manager's repair task. Subscribe data_sync_repair_task_impl abort() to node_ops_info abort_source.	2022-12-19 15:57:35 +01:00
Aleksandra Martyniuk	2b35d7df1b	tasks: abort tasks on system shutdown When system shutdowns, all task manager's top level tasks are aborted. Responsibility for aborting child tasks is on their parents.	2022-12-19 15:57:35 +01:00
Botond Dénes	827cd0d37b	sstables: coroutinize sstable::load() It nicely simplified by it. No regression expected, this method is supposedly only used by tests and tools.	2022-12-19 09:33:52 -05:00
Raphael S. Carvalho	d9ab59043e	db: Add config for setting static number of compaction groups This new option allows user to control the number of compaction groups per table per shard. It's 0 by default which implies a single compaction group, as is today. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:24 -03:00
Raphael S. Carvalho	9cf4dc7b62	replica: Introduce static compaction groups This is the initial support for multiple groups. _x_log2_compaction_groups controls the number of compaction groups and the partitioning strategy within a single table. The value in _x_log2_compaction_groups refers to log base 2 of the actual number of groups. 0 means 1 compaction group. 1 means 2 groups and 2 most significant bits of token being used to pick the target group. The group partitioner should be later abstracted for making tablet integration easier in the future. _x_log2_compaction_groups is still a constant but a config option will come next. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:23 -03:00
Raphael S. Carvalho	c807e61715	test: sstable_test: Stop referencing single compaction group Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:20 -03:00
Raphael S. Carvalho	254c38c4d2	api: compaction_manager: Stop a compaction type for all groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:19 -03:00
Raphael S. Carvalho	4e836cb96c	api: Estimate pending tasks on all compaction groups Estimates # of compaction jobs to be performed on a table. Adaptation is done by adding estimation from all groups. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:17 -03:00
Raphael S. Carvalho	640436e72a	api: storage_service: Run maintenance compactions on all compaction groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:15 -03:00
Raphael S. Carvalho	e0c5cbee8d	replica: table: Adapt assertion to compaction groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:13 -03:00
Raphael S. Carvalho	d35cf88f09	replica: database: stop and disable compaction on behalf of all groups With compaction group model, truncate_table_on_all_shards() needs to stop and disable compaction for all groups. replica::table::as_table_state() will be removed once no user remains, as each table may map to multiple groups. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:12 -03:00
Raphael S. Carvalho	50b02ee0bd	replica: Introduce table::parallel_foreach_table_state() This will replace table::as_table_state(). The latter will be killed once its usage drops to zero. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:10 -03:00
Raphael S. Carvalho	fd69bd433e	replica: disable auto compaction on behalf of all groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:08 -03:00
Raphael S. Carvalho	6fefbe5706	replica: table: Rework compaction triggers for compaction groups Allow table-wide compaction trigger, as well as fine-grained trigger like after flushing a memtable on behalf of a single group. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:07 -03:00
Raphael S. Carvalho	6a6adea3ab	replica: Adapt table::get_sstables_including_compacted_undeleted() to compaction groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:05 -03:00
Raphael S. Carvalho	5919836da8	replica: Adapt table::rebuild_statistics() to compaction groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:04 -03:00
Raphael S. Carvalho	70b727db31	replica: table: Perform major compaction on behalf of all groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:01 -03:00
Raphael S. Carvalho	e3ccdb17a0	replica: table: Perform off-strategy compaction on behalf of all groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:00 -03:00
Raphael S. Carvalho	6efc9fd1f6	replica: table: Perform cleanup compaction on behalf of all groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:58 -03:00
Raphael S. Carvalho	36e11eb2a5	replica: Extend table::discard_sstables() to operate on all compaction groups discard_sstables() runs on context of truncate, which is a table-wide operation today, and will remain so with multiple static groups. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:55 -03:00
Raphael S. Carvalho	24c3687c3f	replica: table: Create compound sstable set for all groups Avoids extra compound set for single-compaction-group table. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:52 -03:00
Raphael S. Carvalho	eb620da981	replica: table: Set compaction strategy on behalf of all groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:50 -03:00
Raphael S. Carvalho	7a0e4f900f	replica: table: Return min memtable timestamp across all groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:49 -03:00
Raphael S. Carvalho	ceaa8a1ef1	replica: Adapt table::stop() to compaction groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:47 -03:00
Raphael S. Carvalho	facf923440	replica: Adapt table::clear() to compaction groups clear() clears memtable content and cache. Cache is shared by groups, therefore adaptation happens by only clearing memtables of all groups. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:45 -03:00
Raphael S. Carvalho	a9c902cd5e	replica: Adapt table::can_flush() to compaction groups can_flush() is used externally to determine if a table has an active memtable that can be flushed. Therefore, adaptation happens by returning true if any of the groups can be flushed. A subsequent flush request will flush memtable of all groups that are ready for it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:44 -03:00
Raphael S. Carvalho	ea42090d47	replica: Adapt table::flush() to compaction groups Adaptation of flush() happens by trigger flush on memtable of all groups. table::seal_active_memtable() will bail out if memtable is empty, so it's not a problem to call flush on a group which memtable is empty. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:42 -03:00
Raphael S. Carvalho	7274c83098	replica: Introduce parallel_foreach_compaction_group() This variant will be useful when iterating through groups and performing async actions on each. It guarantees that all groups are alive by the time they're reached in the loop. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:40 -03:00
Raphael S. Carvalho	89ab9d7227	replica: Adapt table::set_schema() to compaction groups set_schema() is used by the database to apply schema changes to table components which include memtables. Adaptation happens by setting schema to memtable(s) of all groups. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:38 -03:00
Raphael S. Carvalho	0022322ae3	replica: Add memtables from all compaction groups for reads Let's add memtables of all compaction groups. Point queries are optimized by picking a single group. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:36 -03:00
Raphael S. Carvalho	e044001176	replica: Add memtable_count() method to compaction_group Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:34 -03:00
Raphael S. Carvalho	f2ea79f26c	replica: table: Reserve reader list capacity through a callback add_memtables_to_reader_list() will be adapted to compaction groups. For point queries, it will add memtables of a single group. With the callback, add_memtables_to_reader_list() can tell its caller the exact amount of memtable readers to be added, so it can reserve precisely the readers capacity. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:33 -03:00
Raphael S. Carvalho	e841508685	replica: Extract addition of memtables to reader list into a new function Will make it easier for adding memtables of all compaction groups. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:19 -03:00
Raphael S. Carvalho	530956b2de	replica: Adapt table::occupancy() to compaction groups table::occupancy() provides accumulated occupancy stats from memtables. Adaptation happens by accumulating stats from memtables of all groups. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:17 -03:00
Raphael S. Carvalho	ef8f542d75	replica: Adapt table::active_memtable() to compaction groups active_memtable() was fine to a single group, but with multiple groups, there will be one active memtable per group. Let's change the interface to reflect that. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:14 -03:00
Raphael S. Carvalho	429c5aa2f9	replica: Introduce table::compaction_groups() Useful for iterating through all groups. This is intermediary implementation which requires allocation as only one group is supported today. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:12 -03:00
Raphael S. Carvalho	514008f136	replica: Preparation for multiple compaction groups Adjusts scylla_memtables gdb command to multiple groups, while keeping backward compatibility. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:10 -03:00
Raphael S. Carvalho	52b94b6dd7	scylla-gdb: Fix backward compatibility of scylla_memtables command Fix it while refactoring the code for arrival of multiple compaction groups. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:07 -03:00
Anna Stuchlik	bbfb9556fc	doc: mark the in-memory tables feature as deprecated Closes #12286	2022-12-19 15:39:31 +02:00
Avi Kivity	c70a9b0166	test: make test xml filenames more unique `ea99750de7` ("test: give tests less-unique identifiers") made the disambiguating ids only be unambiguous within a single test case. This made all tests named "run" have the name name "run.1". Fix that by adding the suite name everywhere: in test paths, and in junit test case names. Fixes #12310. Closes #12313	2022-12-19 15:03:51 +02:00
Botond Dénes	3e6ddf21bc	Merge 'storage_service: unbootstrap: avoid unnecessary copy of ranges_to_stream' from Benny Halevy `ranges_to_stream` is a map of ` std::unordered_multimap<dht::token_range, inet_address>` per keyspace. On large clusters with a large number of keyspace, copying it may cause reactor stalls as seen in #12332 This series eliminates this copy by using std::move and also turns `stream_ranges` into a coroutine, adding maybe_yield calls to avoid further stalls down the road. Fixes #12332 Closes #12343 * github.com:scylladb/scylladb: storage_service: stream_ranges: unshare streamer storage_service: stream_ranges: maybe_yield storage_service: coroutinize stream_ranges storage_service: unbootstrap: move ranges_to_stream_by_keyspace to stream_ranges	2022-12-19 12:53:16 +02:00
Benny Halevy	e8aa1182b2	docs: replace-dead-node: get rid of hibernate status With replace using node operations, the HIBERNATE gossip status is not used anymore. This change updates documentation to reflect that. During replace, the replacing nodes shows in gossipinfo in STATUS:NORMAL. Also, the replaced node shows as DN in `nodetool status` while being replaced, so remove paragraph showing it's not listed in `nodetool status`. Plus. tidy up the text alignment. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-19 12:19:10 +02:00
Benny Halevy	c9993f020d	storage_service: get rid of handle_state_replacing Since `2ec1f719de` nodes no longer publish HIBERNATE state so we don't need to support handling it. Replace is now always done using node operations (using repair or streaming). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-19 12:19:08 +02:00
Benny Halevy	60de7d28db	storage_service: stream_ranges: unshare streamer Now that stream_ranges is a coroutine streamer can be an automatic variable on the coroutine stack frame. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-19 07:42:07 +02:00
Benny Halevy	9badcd56ca	storage_service: stream_ranges: maybe_yield Prevent stalls with a large number of keyspaces and token ranges. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-19 07:42:07 +02:00
Benny Halevy	2cf75319b0	storage_service: coroutinize stream_ranges Before adding maybe_yield calls. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-19 07:42:01 +02:00
Benny Halevy	82486bb5d2	storage_service: unbootstrap: move ranges_to_stream_by_keyspace to stream_ranges Avoid a potentially large memory copy causing a reactor stall with a large number of keyspaces. Fixes #12332 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-19 07:39:48 +02:00
Avi Kivity	7c7eb81a66	Merge 'Encapsulate filesystem access by sstable into filesystem_storage subsclass' from Pavel Emelyanov This is to define the API sstable needs from underlying storage. When implementing object-storage backend it will need to implement those. The API looks like future<> snapshot(const sstable& sst, sstring dir, absolute_path abs) const; future<> quarantine(const sstable& sst, delayed_commit_changes* delay); future<> move(const sstable& sst, sstring new_dir, generation_type generation, delayed_commit_changes* delay); void open(sstable& sst, const io_priority_class& pc); // runs in async context future<> wipe(const sstable& sst) noexcept; future<file> open_component(const sstable& sst, component_type type, open_flags flags, file_open_options options, bool check_integrity); It doesn't have "list" or alike, because it's not a method of an individual sstable, but rather the one from sstables_manager. It will come as separate PR. Closes #12217 * github.com:scylladb/scylladb: sstable, storage: Mark dir/temp_dir private sstable: Remove get_dir() (well, almost) sstable: Add quarantine() method to storage sstable: Use absolute/relative path marking for snapshot() sstable: Remove temp_... stuff from sstable sstable: Move open_component() on storage sstable: Mark rename_new_sstable_component_file() const sstable: Print filename(type) on open-component error sstable: Reorganize new_sstable_component_file() sstable: Mark filename() private sstable: Introduce index_filename() tests: Disclosure private filename() calls sstable: Move wipe_storage() on storage sstable: Remove temp dir in wipe_storage() sstable: Move unlink parts into wipe_storage sstable: Remove get_temp_dir() sstable: Move write_toc() to storage sstable: Shuffle open_sstable() sstable: Move touch_temp_dir() to storage sstable: Move move() to storage sstable: Move create_links() to storage sstable: Move seal_sstable() to storage sstable: Tossing internals of seal_sstable() sstable: Move remove_temp_dir() to storage sstable: Move create_links_common() to storage sstable: Move check_create_links_replay() to storage sstable: Remove one of create_links() overloads sstable: Remove create_links_and_mark_for_removal() sstable: Indentation fix after prevuous patch sstable: Coroutinize create_links_common() sstable: Rename create_links_common()'s "dir" argument sstable: Make mark_for_removal bool_class sstable, table: Add sstable::snapshot() and use in table::take_snapshot sstable: Move _dir and _temp_dir on filesystem_storage sstable: Use sync_directory() method test, sstable: Use component_basename in test sstables: Move read_{digest\|checksum} on sstable	2022-12-18 17:29:35 +02:00
Anna Stuchlik	6a8eb33284	docs: add the new upgade guide 2022.1 to 2022.2 to the index and the toctree	2022-12-16 17:13:50 +01:00
Anna Stuchlik	36f4ef2446	docs: add the index file for the new upgrage guide from 2022.1 to 2022.2	2022-12-16 17:11:25 +01:00
Anna Stuchlik	8d8983e029	docs: add the metrics update file to the upgrade guide 2022.1 to 2022.2	2022-12-16 17:09:21 +01:00
Anna Stuchlik	252c2139c2	docs: add the upgrade guide for ScyllaDB Enterprise from 2022.1 to 2022.2	2022-12-16 17:07:00 +01:00
Michał Chojnowski	b52bd9ef6a	db: commitlog: remove unused max_active_writes() Dead and misleading code. Closes #12327	2022-12-16 10:23:03 +02:00
Nadav Har'El	327539b15d	Merge 'test.py: fix cql failure handling' from Alecco Fix a bug in failure handling and log level. Closes #12336 * github.com:scylladb/scylladb: test.py: convert param to str test.py: fix error level for CQL tests	2022-12-16 09:29:21 +02:00
Botond Dénes	cc03becf82	Merge 'tasks: get task's type with method' from Aleksandra Martyniuk Type of operation is related to a specific implementation of a task. Then, it should rather be access with a virtual method in tasks::task_manager::task::impl than be its attribute. Closes #12326 * github.com:scylladb/scylladb: api: delete unused type parameter from task_manager_test api tasks: repair: api: remove type attribute from task_manager::task::status tasks: add type() method to task_manager::task::impl repair: add reason attribute to repair_task	2022-12-16 09:20:26 +02:00
Aleksandra Martyniuk	f81ad2d66a	repair: make shard tasks internal Shard tasks should not be visible to users by default, thus they are made internal. Closes #12325	2022-12-16 09:05:30 +02:00
Aleksandra Martyniuk	bae887da3b	tasks: add virtual destructor to task_manager::module When an object of a class inheriting from task_manager::module is destroyed, destructor of the derived class should be called. Closes #12324	2022-12-16 08:59:26 +02:00
Raphael S. Carvalho	e6fb3b3a75	compaction: Delete atomically off-strategy input sstables After commit `a57724e711`, off-strategy no longer races with view building, therefore deletion code can be simplified and piggyback on mechanism for deleting all sstables atomically, meaning a crash midway won't result in some of the files coming back to life, which leads to unnecessary work on restart. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12245	2022-12-16 08:15:49 +02:00
Alejo Sanchez	9b65448d38	test.py: convert param to str The format_unidiff() function takes str, not pathlib PosixPath, so convert it to str. This prevented diff output of unexpected result to be shown in the log file. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-12-15 20:46:35 +01:00
Alejo Sanchez	5142d80bb1	test.py: fix error level for CQL tests If the test fails, use error log level. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-12-15 20:45:44 +01:00
Botond Dénes	64903ba7d5	test/cql-pytest: use pytest site-packages workaround Recently, the pytest script shipped by Fedora started invoking python with the `-s` flag, which disables python considering user site packages. This caused problems for our tests which install the cassandra driver in the user site packages. This was worked around in `e5e7780f32` by providing our own pytest interposer launcher script which does not pass the above mentioned flag to python. Said patch fixed test.py but not the run.py in cql-pytest. So if the cql-pytest suite is ran via test.py it works fine, but if it is invoked via the run script, it fails because it cannot find the cassandra driver. This patch patches run.py to use our own pytest launcher script, so the suite can be run via the run script as well. Since run.py is shared with the alternator pytest suite, this patch also fixes said test suite too. Closes #12253	2022-12-15 16:05:31 +02:00
Benny Halevy	639e247734	test: cql-pytest: test_describe: test_table_options_quoting: USE test_keyspace Without that, I often (but not always) get the following error: ``` __________________________ test_table_options_quoting __________________________ cql = <cassandra.cluster.Session object at 0x7f1aafb10650> test_keyspace = 'cql_test_1671103335055' def test_table_options_quoting(cql, test_keyspace): type_name = f"some_udt; DROP KEYSPACE {test_keyspace}" column_name = "col''umn -- @quoting test!!" comment = "table''s comment test!\"; DESC TABLES --quoting test" comment_plain = "table's comment test!\"; DESC TABLES --quoting test" #without doubling "'" inside comment > cql.execute(f"CREATE TYPE \"{type_name}\" (a int)") test/cql-pytest/test_describe.py:623: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ cassandra/cluster.py:2699: in cassandra.cluster.Session.execute ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E cassandra.InvalidRequest: Error from server: code=2200 [Invalid query] message="No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename" ``` CQL driver in use ise the scylla driver version 3.25.10. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12329	2022-12-15 14:35:33 +02:00
Aleksandra Martyniuk	f0b2b00a15	api: delete unused type parameter from task_manager_test api	2022-12-15 10:50:30 +01:00
Aleksandra Martyniuk	5bc09daa7a	tasks: repair: api: remove type attribute from task_manager::task::status	2022-12-15 10:49:09 +01:00
Aleksandra Martyniuk	8d5377932d	tasks: add type() method to task_manager::task::impl	2022-12-15 10:41:58 +01:00
Aleksandra Martyniuk	329176c7bc	repair: add reason attribute to repair_task As a preparation to creating a type() method in task_manager::task::impl a streaming::stream_reason is kept in repair_task.	2022-12-15 10:38:38 +01:00
Botond Dénes	9713a5c314	tool/scylla-sstable: move documentation online The inline-help of operations will only contain a short summary of the operation and the link to the online documentation. The move is not a straightforward copy-paste. First and foremost because we move from simple markdown to RST. Informal references are also replaced with proper RST links. Some small edits were also done on the texts. The intent is the following: * the inline help serves as a quick reference for what the operation does and what flags it has; * the online documentation serves as the full reference manual, explaining all details;	2022-12-15 04:10:21 -05:00
Botond Dénes	3cf7afdf95	docs: scylla-sstable.rst: add sstable content section Provides a link to the architecture/sstable page for more details on the sstable format itself. It also describes the mutation-fragment stream, the parts of it that is relevant to the sstable operations. The purpose of this section is to provide a target for links that want to point to a common explanation on the topic. In particular, we will soon move the detailed documentation of the scylla-sstable operations into this file and we want to have a common explanation of the mutation fragment stream that these operations can point to.	2022-12-15 04:10:21 -05:00
Botond Dénes	641fb4c8bb	docs: scylla-{sstable,types}.rst: drop Syntax section In both files, the section hierarchy is as follows: Usage Syntax Sections with actual content This scheme uses up 3 levels of hierarchy, leaving not much room to expand the sections with actual content with subsections of their own. Remove the Syntax level altogether, directly embedding the sections with content under the Usage section.	2022-12-15 04:03:00 -05:00
Botond Dénes	8f8284783a	Merge 'Fix handling of non-full clustering keys in the read path' from Tomasz Grabiec This PR fixes several bugs related to handling of non-full clustering keys. One is in trim_clustering_row_ranges_to(), which is broken for non-full keys in reverse mode. It will trim the range to position_in_partition_view::after_key(full_key) instead of position_in_partition_view::before_key(key), hence it will include the key in the resulting range rather than exclude it. Fixes #12180 after_key() was creating a position which is after all keys prefixed by a non-full key, rather than a position which is right after that key. This will issue will be caught by cql_query_test::test_compact_storage in debug mode when mutation_partition_v2 merging starts inserting sentinels at position after_key() on preemption. It probably already causes problems for such keys as after_key() is used in various parts in the read path. Refs #1446 Closes #12234 * github.com:scylladb/scylladb: position_in_partition: Make after_key() work with non-full keys position_in_partition: Introduce before_key(position_in_partition_view) db: Fix trim_clustering_row_ranges_to() for non-full keys and reverse order types: Fix comparison of frozen sets with empty values	2022-12-15 10:47:12 +02:00
Pavel Emelyanov	6d10a3448b	sstable, storage: Mark dir/temp_dir private Now all storage access via sstable happens with the help of storage class API so its internals can be finally made private. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	6296ca3438	sstable: Remove get_dir() (well, almost) The sstable::get_dir() is now gone, no callers know that sstable lives in any path on a filesystem. There are only few callers left. One is several places in code that need sstable datafile, toc and index paths to print them in logs. The other one is sstable_directory that is to be patched separately. For both there's a storage.prefix() method that prepends component name with where the sstable is "really" located. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	7402787d16	sstable: Add quarantine() method to storage Moving sstable to quarantine has some specific -- if the sstable is in staging/ directory it's anyway moved into root/quarantine dir, not into the quarantine subdir of its current location. Encapsulate this feture in storage class method. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	f507271578	sstable: Use absolute/relative path marking for snapshot() The snapshotting code uses full paths to files to manipulate snapshotted sstables. Until this code is patched to use some proper snapshotting API from sstable/ module, it will continue doing so. Nowever, to remove the get_dir() method from sstable() the seal_sstable() needs to put relative "backup" directory to storage::snapshot() method. This patch adds a temporary bool_class for this distinguishing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	a46d378bee	sstable: Remove temp_... stuff from sstable There's a bunch of helpers around XFS-specific temp-dir sitting in publie sstable part. Drop it altogether, no code needs it for real. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	adba24d8ae	sstable: Move open_component() on storage Obtaining a class file object to read/write sstable from/to is now storage-specific. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	4c22831d23	sstable: Mark rename_new_sstable_component_file() const It's in fact such. Next patch will need it const to call this method via const sstable reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	6bf3e3a921	sstable: Print filename(type) on open-component error The file path is going to disappear soon, so print the filename() on error. For now it's the same, but the meaning of the filename() returning string is changing to become "random label for the log reader". Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	dc72bce6d7	sstable: Reorganize new_sstable_component_file() The helper consists of three stages: 1. open a file (probably in a temp dir) 2. decorate it with extentions and checked_file 3. optionally rename a file from temp dir The latter is done to trigger XFS allocate this file in separate block group if the file was created in temp dir on step 1. This patch swaps steps 2 and 3 to keep filesystem-specific opening next to each other. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	e55c740f49	sstable: Mark filename() private From now on no callers should use this string to access anything on disk Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	5f579eb405	sstable: Introduce index_filename() Currently the sstable::filename(Index) is used in several places that get the filename as a printable or throwable string and don't treat is as a real location of any file. For those, add the index_filename() helper symmetrical to toc_filename() and (in some sense) the get_filename() one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	bbbbd6dbfc	tests: Disclosure private filename() calls The sstable::filename() is going to become private method. Lots of tests call it, but tests do call a lot of other sstable private methods, that's OK. Make the sstable::filename() yet another one of that kind in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	4a91f3d443	sstable: Move wipe_storage() on storage Now when the filesystem cleaning code is sitting in one method, it can finally be made the storage class one. Exception-safe allocation of toc_name (spoiler: it's copied anyway one step later, so it's "not that safe" actually) is moved into storage as well. The caller is left with toc_filename() call in its exception handler. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	c92d45eaa9	sstable: Remove temp dir in wipe_storage() When unlinking an sstable for whatever reason it's good to check if the temp dir is handing around. In some cases it's not (compaction), but keeping the whole wiping code together makes it easier to move it on storage class in one go. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	88ede71320	sstable: Move unlink parts into wipe_storage Just move the code. This is to make the next patch smaller. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	0336cb3bdd	sstable: Remove get_temp_dir() Only one private called of it left, it's better to open-code it there Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	3326063b8b	sstable: Move write_toc() to storage This method initiates the sstable creation. Effectively it's the first step in sstable creation transaction implemented on top of rename() call. Thus this method is moved onto storage under respective name. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	636d49f1c1	sstable: Shuffle open_sstable() When an sstable is prepared to be written on disk the .write_toc() is called on it which created temporary toc file. Prior to this, the writer code calls generate_toc() to collect components on the sstable. This patch adds the .open_sstable() API call that does both. This prepares the write_toc() part to be moved to storage, because it's not just "write data into TOC file", it's the first step in transaction implemeted on top of rename()s. The test need care -- there's rewrite_toc_without_scylla_component() thing in utils that doesn't want the generate_toc() part to be called. It's not patched here and continues calling .write_toc(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	d3216b10d6	sstable: Move touch_temp_dir() to storage The continuation of the previously moved remove_temp_dir() one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	1a34cb98fc	sstable: Move move() to storage The sstable can be "moved" in two cases -- to move from staging or to move to quarantine. Both operation are sstable API ones, but the implementation is storage-specific. This patch makes the latter a method of storage class. One thing to note is that only quarantine() touched the target directly. Now also the move_to_new_dir() happenning on load also does it, but that's harmless. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:47 +03:00
Pavel Emelyanov	18f6165993	sstable: Move create_links() to storage This method is currently used in two places: sstable::snapshot() and sstable::seal_sstable(). The latter additionally touches the target backup/ subdir. This patch moves the whole thing on storage and adds touch for all the cases. For snapshots this might be excessive, but harmless. Tests get their private-disclosure way to access sstable._storage in few places to call create_links directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:45 +03:00
Pavel Emelyanov	136a8681e0	sstable: Move seal_sstable() to storage Now the sstable sealing is split into storage part, internal-state part and the seal-with-backup kick. This move makes remove_temp_dir() private. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:45 +03:00
Pavel Emelyanov	334d231f56	sstable: Tossing internals of seal_sstable() There are two of them -- one API call and the other one that just "seals" it. The latter one also changes the _marked_for_deletion bit on the sstable. This patch makes the latter method prepared to be moved onto storage, because sealing means comitting TOC file on disk with the help of rename system call which is purely storage thing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:45 +03:00
Pavel Emelyanov	ce3a8a4109	sstable: Move remove_temp_dir() to storage This one is simple, it just accesses _temp_dir thing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:45 +03:00
Pavel Emelyanov	9027d137d2	sstable: Move create_links_common() to storage Same as previous patch. This move makes the previously moved check_create_links_replay() a private method of the storage class. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:45 +03:00
Pavel Emelyanov	990032b988	sstable: Move check_create_links_replay() to storage It needs to get sstable const reference to get the filename(s) from it. Other than that it's pure filesystem-accessing method. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:45 +03:00
Pavel Emelyanov	041a8c80ad	sstable: Remove one of create_links() overloads There are two -- one that accepts generation and the other one that does not. The latter is only called by the former, so no need in keeping both. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:45 +03:00
Pavel Emelyanov	f1558b6988	sstable: Remove create_links_and_mark_for_removal() There's only one user of it, it can document its "and mark for removal" intention via dedicated bool_class argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:45 +03:00
Pavel Emelyanov	65f40b28e6	sstable: Indentation fix after prevuous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:45 +03:00
Pavel Emelyanov	428adda4a9	sstable: Coroutinize create_links_common() Looks much shorter and easier-to-patch this way. The dst_dir argument is made value from const reference, old code copied it with do_with() anyway. Indentation is deliberately left broken until next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:45 +03:00
Pavel Emelyanov	ab13a99586	sstable: Rename create_links_common()'s "dir" argument The whole method is going to move onto newly introduced filesystem_storage that already has field of the same name onboard. To avoid confusion, rename the argument to dst_dir. No functional changes, _just_ s/dir/dst_dir/g throughout the method. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:45 +03:00
Pavel Emelyanov	4977c73163	sstable: Make mark_for_removal bool_class Its meaning is comment-documented anyway. Also, next patches will remove the create_links_and_mark_for_removal() so callers need some verbose meaning of this boolean in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:45 +03:00
Pavel Emelyanov	f53d6804a6	sstable, table: Add sstable::snapshot() and use in table::take_snapshot The replica/ code now "knows" that snapshotting an sstable means creating a bunch of hard-links on disk. Abstract that via sstable::snapshot() method. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:44 +03:00
Pavel Emelyanov	2803dcda6d	sstable: Move _dir and _temp_dir on filesystem_storage Those two fields define the way sstable is stored as collection of on-disk files. First step towards making the storage access abstract is in moving the paths onto filesystem_storage embedded class. Both are made public for now, the rest of the code is patched to access them via _storage.<smth>. The rest of the set moves parts of sstable:: methods into the filesystem_storage, then marks the paths private. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:44 +03:00
Pavel Emelyanov	17c8ba6034	sstable: Use sync_directory() method The sstable::write_toc() executes sync_directory() by hand. Better to use the method directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:44 +03:00
Pavel Emelyanov	e934f42402	test, sstable: Use component_basename in test One case gets full sstable datafile path to get the basename from it. There's already the basename helper on the class sstable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:44 +03:00
Pavel Emelyanov	376915d406	sstables: Move read_{digest\|checksum} on sstable These methods access sstables as files on disk, in order to hide the "path on filesystem" meaning of sstables::filename() the whole method should be made sstable:: one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:44 +03:00
Pavel Emelyanov	d561495f0d	Merge 'topology: get rid of pending state' from Benny Halevy Now, with `a44ca06906`, is_normal_token_owner that replaced is_member does not rely anymore on the pending status of endpoints in topology. With that we can get rid of this state and just keep all endpoints we know about in the topology. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12294 * github.com:scylladb/scylladb: topology: get rid of pending state topology: debug log update and remove endpoint	2022-12-14 19:28:35 +03:00
Benny Halevy	bdb6550305	view: row_locker: add latency_stats_tracker Refactor the existing stats tracking and updating code into struct latency_stats_tracker and while at it, count lock_acquisitions only on success. Decrement operations_currently_waiting_for_lock in the destructor so it's always balanced with the uncoditional increment in the ctor. As for updating estimated_waiting_for_lock, it is always updated in the dtor, both on success and failure since the wait for the lock happened, whether waiting timed out or not. Fixes #12190 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12225	2022-12-14 17:37:22 +02:00
Avi Kivity	9ee78975b7	Merge 'Fix topology mismatch on read-repair handler creation' from Pavel Emelyanov The schedule_repair() receives a bunch of endpoint:mutations pairs and tries to create handlers for those. When creating the handlers it re-obtains topology from schema->ks->effective_replication_map chain, but this new topology can be outdated as compared to the list of endpoints at hand. The fix is to carry the e.r.m. pointer used by read executor reconciliation all the way down to repair handlers creation. This requires some manipulations with mutate_internal() and mutate_prepare() argument lists. fixes: #12050 (it was the same problem) Closes #12256 * github.com:scylladb/scylladb: proxy: Carry replication map with repair mutation(s) proxy: Wrap read repair entries into read_repair_mutation proxy: Turn ref to forwardable ref in mutations iterator	2022-12-14 17:33:43 +02:00
Tomasz Grabiec	23e4c83155	position_in_partition: Make after_key() work with non-full keys This fixes a long standing bug related to handling of non-full clustering keys, issue #1446. after_key() was creating a position which is after all keys prefixed by a non-full key, rather than a position which is right after that key. This will issue will be caught by cql_query_test::test_compact_storage in debug mode when mutation_partition_v2 merging starts inserting sentinels at position after_key() on preemption. It probably already causes problems for such keys.	2022-12-14 14:47:33 +01:00
Botond Dénes	16c50bed5e	Merge 'sstables: coroutinize update_info_for_opened_data' from Avi Kivity A complicated function (in continuation style) that benefits from this simplification. Closes #12289 * github.com:scylladb/scylladb: sstables: update_info_for_opened_data: reindent sstables: update_info_for_opened_data: coroutinize	2022-12-14 15:12:22 +02:00
Nadav Har'El	92d03be37b	materialized view: fix bug in some large modifications to base partitions Sometimes a single modification to a base partition requires updates to a large number of view rows. A common example is deletion of a base partition containing many rows. A large BATCH is also possible. To avoid large allocations, we split the large amount of work into batch of 100 (max_rows_for_view_updates) rows each. The existing code assumed an empty result from one of these batches meant that we are done. But this assumption was incorrect: There are several cases when a base-table update may not need a view update to be generated (see can_skip_view_updates()) so if all 100 rows in a batch were skipped, the view update stopped prematurely. This patch includes two tests showing when this bug can happen - one test using a partition deletion with a USING TIMESTAMP causing the deletion to not affect the first 100 rows, and a second test using a specially-crafed large BATCH. These use cases are fairly esoteric, but in fact hit a user in the wild, which led to the discovery of this bug. The fix is fairly simple: To detect when build_some() is done it is no longer enough to check if it returned zero view-update rows; Rather, it explicitly returns whether or not it is done as an std::optional. The patch includes several tests for this bug, which pass on Cassandra, failed on Scylla before this patch, and pass with this patch. Fixes #12297. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12305	2022-12-14 14:50:38 +02:00
Botond Dénes	e7d8855675	Merge 'Revert accidental submodule updates' from Benny Halevy The abseil and tools/java submodules were accidentally updated in `71bc12eecc` (merged to master in `51f867339e`) This series reverts those changes. Closes #12311 * github.com:scylladb/scylladb: Revert accidental update of tools/java submodule Revert accidental update of abseil submodule	2022-12-14 13:20:08 +02:00
Benny Halevy	865193f99a	Revert accidental update of tools/java submodule The tools/java submodule was accidentally updated in `71bc12eecc` Revert this change. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-14 13:06:30 +02:00
Benny Halevy	9911ba195b	Revert accidental update of abseil submodule The abseil module was accidentally updated in `71bc12eecc` Revert this change. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-14 13:05:04 +02:00
Pavel Emelyanov	ab8fc0e166	proxy: Carry replication map with repair mutation(s) The create_write_response_handler() for read repair needs the e.r.m. from the caller, because it effectively accepts list of endpoints from it. So this patch equips all read_repair_mutation-s with the e.r.m. pointer so that the handler creation can use it. It's the same for all mutations, so it's a waste of space, but it's not bad -- there's typically few mutations in this range and the entry passed there is temporary, so even lots of them won't occupy lots of memory for long. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-14 14:03:39 +03:00
Pavel Emelyanov	140f373e15	proxy: Wrap read repair entries into read_repair_mutation The schedule_repair() operates on a map of endpoint:mutations pairs. Next patch will need to extend this entry and it's going to be easier if the entry is wrapped in a helper structure in advance. This is where the forwardable reference cursor from the previous patch gets its user. The schedule_repair() produces a range of rvalue wrappers, but the create_write_response_handler accepting it is OK, it copies mutations anyway. The printing operator is added to facilitate mutations logging from mutate_internal() method. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-14 14:01:12 +03:00
Pavel Emelyanov	014b563ef1	proxy: Turn ref to forwardable ref in mutations iterator The mutate_prepare() is iterating over range of mutation with 'auto&' cursor thus accepting only lvalues. This is very restrictive, the caller of mutate_prepare() may as well provide rvalues if the target create_write_response_handler() or lambda accepts it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-14 14:00:10 +03:00
Avi Kivity	3fa230fee4	Merge 'cql3: expr: make it possible to prepare and evaluate conjunctions' from Jan Ciołek This PR implements two things: * Getting the value of a conjunction of elements separated by `AND` using `expr::evaluate` * Preparing conjunctions using `prepare_expression` --- `NULL` is treated as an "unkown value" - maybe `true` maybe `false`. `TRUE AND NULL` evaluates to `NULL` because it might be `true` but also might be `false`. `FALSE AND NULL` evaluates to `FALSE` because no matter what value `NULL` acts as, the result will still be `FALSE`. Unset and empty values are not allowed. Usually in CQL the rule is that when `NULL` occurs in an operation the whole expression becomes `NULL`, but here we decided to deviate from this behavior. Treating `NULL` as an "unkown value" is the standard SQL way of handing `NULLs` in conjunctions. It works this way in MySQL and Postgres so we do it this way as well. The evaluation short-circuits. Once `FALSE` is encountered the function returns `FALSE` immediately without evaluating any further elements. It works this way in Postgres as well, for example: `SELECT true AND NULL AND 1/0 = 0` will throw a division by zero error, but `SELECT false AND 1/0 = 0` will successfully evaluate to `FALSE`. Closes #12300 * github.com:scylladb/scylladb: expr_test: add unit tests for prepare_expression(conjunction) cql3: expr: make it possible to prepare conjunctions expr_test: add tests for evaluate(conjunction) cql3: expr: make it possible to evaluate conjunctions	2022-12-14 09:48:26 +02:00
Botond Dénes	122b267478	Merge 'repair: coroutinize to_repair_rows_list' from Avi Kivity Simplify a somewhat complicated function. Closes #12290 * github.com:scylladb/scylladb: repair: to_repair_rows_list: reindent repair: to_repair_rows_list: coroutinize	2022-12-14 09:39:47 +02:00
Avi Kivity	c09583bcef	storage_proxy: coroutinize send_truncate_blocking Not particularly important, but a small simplification. Closes #12288	2022-12-14 09:39:33 +02:00
Tomasz Grabiec	132d5d4fa1	messaging: Shutdown on stop() if it wasn't shut down earlier All rpc::client objects have to be stopped before they are destroyed. Currently this is done in messaging_service::shutdown(). The cql_test_env does not call shutdown() currently. This can lead to use-after-free on the rpc::client object, manifesting like this: Segmentation fault on shard 0. Backtrace: column_mapping::~column_mapping() at schema.cc:? db::cql_table_large_data_handler::internal_record_large_cells(sstables::sstable const&, sstables::key const&, clustering_key_prefix const, column_definition const&, unsigned long, unsigned long) const at ./db/large_data_handler.cc:180 operator() at ./db/large_data_handler.cc:123 (inlined by) seastar::future<void> std::__invoke_impl<seastar::future<void>, db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const, column_definition const&, unsigned long, unsigned long>(std::__invoke_other, db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const&&, column_definition const&, unsigned long&&, unsigned long&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61 (inlined by) std::enable_if<is_invocable_r_v<seastar::future<void>, db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const, column_definition const&, unsigned long, unsigned long>, seastar::future<void> >::type std::__invoke_r<seastar::future<void>, db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const, column_definition const&, unsigned long, unsigned long>(db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const&&, column_definition const&, unsigned long&&, unsigned long&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:114 (inlined by) std::_Function_handler<seastar::future<void> (sstables::sstable const&, sstables::key const&, clustering_key_prefix const, column_definition const&, unsigned long, unsigned long), db::cql_table_large_data_handler::cql_table_large_data_handler(gms::feature_service&, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>, utils::updateable_value<unsigned int>)::$_1>::_M_invoke(std::_Any_data const&, sstables::sstable const&, sstables::key const&, clustering_key_prefix const&&, column_definition const&, unsigned long&&, unsigned long&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290 std::function<seastar::future<void> (sstables::sstable const&, sstables::key const&, clustering_key_prefix const, column_definition const&, unsigned long, unsigned long)>::operator()(sstables::sstable const&, sstables::key const&, clustering_key_prefix const, column_definition const&, unsigned long, unsigned long) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591 (inlined by) db::cql_table_large_data_handler::record_large_cells(sstables::sstable const&, sstables::key const&, clustering_key_prefix const, column_definition const&, unsigned long, unsigned long) const at ./db/large_data_handler.cc:175 seastar::rpc::log_exception(seastar::rpc::connection&, seastar::log_level, char const, std::__exception_ptr::exception_ptr) at ./build/release/seastar/./seastar/src/rpc/rpc.cc:109 operator() at ./build/release/seastar/./seastar/src/rpc/rpc.cc:788 operator() at ./build/release/seastar/./seastar/include/seastar/core/future.hh:1682 (inlined by) void seastar::futurize<seastar::future<void> >::satisfy_with_result_of<seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14>(seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&)#1}::operator()(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&) const::{lambda()#1}>(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14>(seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&)#1}::operator()(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&) const::{lambda()#1}&&) at ./build/release/seastar/./seastar/include/seastar/core/future.hh:2134 (inlined by) operator() at ./build/release/seastar/./seastar/include/seastar/core/future.hh:1681 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14>(seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::rpc::client::client(seastar::rpc::logger const&, void, seastar::rpc::client_options, seastar::socket, seastar::socket_address const&, seastar::socket_address const&)::$_14&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>::run_and_dispose() at ./build/release/seastar/./seastar/include/seastar/core/future.hh:781 seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2319 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2756 seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:2925 seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:2808 seastar::app_template::run_deprecated(int, char, std::function<void ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:265 seastar::app_template::run(int, char, std::function<seastar::future<int> ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:156 operator() at ./build/release/seastar/./seastar/src/testing/test_runner.cc:75 (inlined by) void std::__invoke_impl<void, seastar::testing::test_runner::start_thread(int, char)::$_0&>(std::__invoke_other, seastar::testing::test_runner::start_thread(int, char)::$_0&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61 (inlined by) std::enable_if<is_invocable_r_v<void, seastar::testing::test_runner::start_thread(int, char)::$_0&>, void>::type std::__invoke_r<void, seastar::testing::test_runner::start_thread(int, char)::$_0&>(seastar::testing::test_runner::start_thread(int, char)::$_0&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:111 (inlined by) std::_Function_handler<void (), seastar::testing::test_runner::start_thread(int, char)::$_0>::_M_invoke(std::_Any_data const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290 std::function<void ()>::operator()() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:73 Fix by making sure that shutdown() is called prior to destruction. Fixes #12244 Closes #12276	2022-12-14 10:28:26 +03:00
Tzach Livyatan	7cd613fc08	Docs: Improve wording on the os-supported page v2 Closes #11871	2022-12-14 08:59:26 +02:00
Botond Dénes	31fcfe62e1	Merge 'doc: add the description of AzureSnitch to the documentation' from Anna Stuchlik Fixes https://github.com/scylladb/scylladb/issues/11712 Updates added with this PR: - Added a new section with the description of AzureSnitch (similar to others + examples and language improvements). - Fixed the headings so that they render properly. - Replaced "Scylla" with "ScyllaDB". Closes #12254 * github.com:scylladb/scylladb: docs: replace Scylla with ScyllaDB on the Snitches page docs: fix the headings on the Snitches page doc: add the description of AzureSnitch to the documentation	2022-12-14 08:58:48 +02:00
Lubos Kosco	3f9dca9c60	doc: print out the generated UUID for sending to support Closes #12176	2022-12-14 08:57:54 +02:00
guy9	a329fcd566	Updated University monitoring lesson link Closes #11906	2022-12-14 08:50:26 +02:00
Jan Ciolek	9afa9f0e50	expr_test: add unit tests for prepare_expression(conjunction) Add unit tests which ensure that preparing conjunctions works as expected. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-12-13 20:23:17 +01:00
Jan Ciolek	dde86a2da6	cql3: expr: make it possible to prepare conjunctions prepare_expression used to throw an error when encountering a conjunction. Now it's possible to use prepare_expression to prepare an expression that contains conjunctions. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-12-13 20:23:17 +01:00
Jan Ciolek	5f5b1c4701	expr_test: add tests for evaluate(conjunction) Add unit tests which ensure that evaluating a conjunction behaves as expected. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-12-13 20:23:17 +01:00
Jan Ciolek	b3c16f6bc8	cql3: expr: make it possible to evaluate conjunctions Previously it was impossible to use expr::evaluate() to get the value of a conjunction of elements separated by ANDs. Now it has been implemented. NULL is treated as an "unkown value" - maybe true maybe false. `TRUE AND NULL` evaluates to NULL because it might be true but also might be false. `FALSE AND NULL` evaluates to FALSE because no matter what value NULL acts as, the result will still be FALSE. Unset and empty values are not allowed. Usually in CQL the rule is that when NULL occurs in an operation the whole expression becomes NULL, but here we decided to deviate from this behavior. Treating NULL as an "unkown value" is the standard SQL way of handing NULLs in conjunctions. It works this way in MySQL and Postgres so we do it this way as well. The evaluation short-circuits. Once FALSE is encountered the function returns FALSE immediately without evaluating any further elements. It works this way in Postgres as well, for example: `SELECT true AND NULL AND 1/0 = 0` will throw a division by zero error but `SELECT false AND 1/0 = 0` will successfully evaluate to FALSE. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-12-13 20:23:08 +01:00
Benny Halevy	e9e66f3ca7	database: drop_table_on_all_shards: limit truncated_at time The infinetely high time_point of `db_clock::time_point::max()` used in `ba42852b0e` is too high for some clients that can't represent that as a date_time string. Instead, limit it to 9999-12-31T00:00:00+0000, that is practically sufficient to ensure truncation of all sstables and should be within the clients' limits. Fixes #12239 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12273	2022-12-13 16:46:20 +02:00
Avi Kivity	919888fe60	Merge 'docs/dev: Add backport instructions for contributors' from Jan Ciołek Add instructions on how to backport a feature to on older version of Scylla. It contains a detailed step-by-step instruction so that people unfamiliar with intricacies of Scylla's repository organization can easily get the hang of it. This is the guide I wish I had when I had to do my first backport. I put it in backport.md because that looks like the file responsible for this sort of information. For a moment I thought about `CONTRIBUTING.md`, but this is a really short file with general information, so it doesn't really fit there. Maybe in the future there will be some sort of unification (see #12126) Closes #12138 * github.com:scylladb/scylladb: dev/docs: add additional git pull to backport docs docs/dev: add a note about cherry-picking individual commits docs/dev: use 'is merged into' instead of 'becomes' docs/dev: mention that new backport instructions are for the contributor docs/dev: Add backport instructions for contributors	2022-12-13 16:27:04 +02:00
Pavel Emelyanov	fe4cf231bc	snitch: Check http response codes to be OK Several snitch drivers make http requests to get region/dc/zone/rack/whatever from the cloud provider. They blindly rely on the response being successfull and read response body to parse the data they need from. That's not nice, add checks for requests finish with http OK statuses. refs: #12185 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12287	2022-12-13 14:49:18 +02:00
Benny Halevy	68141d0aac	topology: get rid of pending state Now, with `a44ca06906`, is_normal_token_owner that replaced is_member does not rely anymore on the pending status of endpoints in topology. With that we can get rid of this state and just keep all endpoints we know about in the topology. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-13 14:17:18 +02:00
Benny Halevy	f2753eba30	topology: debug log update and remove endpoint Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-13 14:17:13 +02:00
Avi Kivity	c7cee0da40	Merge 'storage_service: handle_state_normal: always update_topology before update_normal_tokens' from Benny Halevy update_normal_tokens checks that that the endpoint is in topology. Currently we call update_topology on this path only if it's not a normal_token_owner, but there are paths when the endpoint could be a normal token owner but still be pending in topology so always update it, just in case. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12080 * github.com:scylladb/scylladb: storage_service: handle_state_normal: always update_topology before update_normal_tokens storage_service: handle_state_normal: delete outdated comment regarding update pending ranges race	2022-12-13 13:41:10 +02:00
Avi Kivity	75e469193b	Merge 'Use Host ID as Raft ID' from Kamil Braun Thanks to #12250, Host IDs uniquely identify nodes. We can use them as Raft IDs which simplifies the code and makes reasoning about it easier, because Host IDs are always guaranteed to be present (while Raft IDs may be missing during upgrade). Fixes: https://github.com/scylladb/scylladb/issues/12204 Closes #12275 * github.com:scylladb/scylladb: service/raft: raft_group0: take `raft::server_id` parameter in `remove_from_group0` gms, service: stop gossiping and storing RAFT_SERVER_ID Revert "gms/gossiper: fetch RAFT_SERVER_ID during shadow round" service: use HOST_ID instead of RAFT_SERVER_ID during replace service/raft: use gossiped HOST_ID instead of RAFT_SERVER_ID to update Raft address map main: use Host ID as Raft ID	2022-12-13 13:39:41 +02:00
Anna Stuchlik	7bc4385551	doc: specify the versions where Alternator TTL is no longer experimental	2022-12-13 11:25:24 +01:00
Andrii Patsula	cd2e786d72	Report a warning when a server's IP cannot be found in ping. Fixes #12156 Closes #12206	2022-12-13 11:18:59 +01:00
Botond Dénes	51f867339e	Merge 'Docs: cleanup add-node-to-cluster' from Benny Halevy This series improves the add-node-to-cluster document, in particular around the documentation for the associated cleanup procedure, and the prerequisite steps. It also removes information about outdated releases. Closes #12210 * github.com:scylladb/scylladb: docs: operating-scylla: add-node-to-cluster: deleted instructions for unsupported releases docs: operating-scylla: add-node-to-cluster: cleanup: move tips to a note docs: operating-scylla: add-node-to-cluster: improve wording of cleanup instructions docs: operating-scylla: prerequisites: system_auth is a keyspace, not a table docs: operating-scylla: prerequisites: no Authetication status is gathered docs: operating-scylla: prerequisites: simplify grep commands docs: operating-scylla: add-node-to-cluster: prerequisites: number sub-sections docs: operating-scylla: add-node-to-cluster: describe other nodes in plural	2022-12-13 10:54:05 +02:00
Botond Dénes	4122854ae7	Merge 'repair: coroutinize repair_range' from Avi Kivity Nicer and simpler, but essentially cosmetic. Closes #12235 * github.com:scylladb/scylladb: repair: reindent repair_range repair: coroutinize repair_range	2022-12-13 08:16:05 +02:00
Avi Kivity	96890d4120	repair: to_repair_rows_list: reindent	2022-12-12 22:54:07 +02:00
Avi Kivity	e482cb1764	repair: to_repair_rows_list: coroutinize Simplifying a complicated function. It will also be a little faster due to fewer allocations, but not significantly.	2022-12-12 22:52:12 +02:00
Avi Kivity	c728de8533	sstables: update_info_for_opened_data: reindent Recover much-needed indent levels for future use.	2022-12-12 22:38:07 +02:00
Avi Kivity	eace9a226c	sstables: update_info_for_opened_data: coroutinize Nothing special, just simplifying a complicated function.	2022-12-12 22:35:46 +02:00
Michał Jadwiszczak	5985f22841	version: Reverse version increase Revert version change made by PR #11106, which increased it to `4.0.0` to enable server-side describe on latest cqlsh. Turns out that our tooling some way depends on it (eg. `sstableloader`) and it breaks dtests. Reverting only the version allows to leave the describe code unchanged and it fixes the dtests. cqlsh 6.0.0 will return a warning when running `DESC ...` commands. Closes #12272	2022-12-12 18:45:32 +02:00
Kamil Braun	a26f62b37b	service/raft: raft_group0: take `raft::server_id` parameter in `remove_from_group0` We no longer need to translate from IP to Raft ID using the address map, because Raft ID is now equal to the Host ID - which is always available at the call site of `remove_from_group0`.	2022-12-12 15:23:05 +01:00
Kamil Braun	bf6679906f	gms, service: stop gossiping and storing RAFT_SERVER_ID It is equal to (if present) HOST_ID and no longer used for anything. The application state was only gossiped if `experimental-features` contained `raft`, so we can free this slot. Similarly, `raft_server_id`s were only persisted in `system.peers` if the `SUPPORTS_RAFT` cluster feature was enabled, which happened only when `experimental-features` contained `raft`. The `raft_server_id` field in the schema was also introduced recently in `master` and didn't get to be in a release yet. Given either of these reasons, we can remove this field safely.	2022-12-12 15:20:30 +01:00
Kamil Braun	5dbe236339	Revert "gms/gossiper: fetch RAFT_SERVER_ID during shadow round" This reverts commit `60217d7f50`. We no longer need RAFT_SERVER_ID.	2022-12-12 15:20:20 +01:00
Kamil Braun	3e58da0719	service: use HOST_ID instead of RAFT_SERVER_ID during replace Makes the code simpler because we can assume that HOST_ID is always there.	2022-12-12 15:18:56 +01:00
Kamil Braun	32c56920b4	service/raft: use gossiped HOST_ID instead of RAFT_SERVER_ID to update Raft address map With the earlier commit, if gossiped RAFT_SERVER_ID is not empty then it's the same as HOST_ID.	2022-12-12 15:16:56 +01:00
Calle Wilund	e99626dc10	config: Change wording of "none" in encryption options to maybe reduce user confusion Fixes /scylladb/scylla-enterprise/issues#1262 Changes the somewhat ambiguous "none" into "not set" to clarify that "none" is not an option to be written out, but an absense of a choice (in which case you also have made a choice). Closes #12270	2022-12-12 16:14:53 +02:00
Kamil Braun	f3243ff674	main: use Host ID as Raft ID The Host ID now uniquely identifies a node (we no longer steal it during node replace) and Raft is still experimental. We can reuse the Host ID of a node as its Raft ID. This will allow us to remove and simplify a lot of code. With this we can already remove some dead code in this commit.	2022-12-12 15:14:51 +01:00
Botond Dénes	d44c5f5548	scripts: add open-coredump.sh Script for "one-click" opening of coredumps. It extracts the build-id from the coredump, retrieves metadata for that build, downloads the binary package, the source code and finally launches the dbuild container, with everything ready to load the coredump. The script is idempotent: running it after the prepartory steps will re-use what is already donwloaded. The script is not trying to provide a debugging environment that caters to all the different ways and preferences of debugging. Instead, it just sets up a minimalistic environment for debugging, while providing opportunities for the user to customization according to their preferred. I'm not entirely sure, coredumps from master branch will work, but we can address this later when we confirm they don't. Example: $ ~/ScyllaDB/scylla/worktree0/scripts/open-coredump.sh ./core.scylla.113.bac3650b616f4f09a4d1ab160574b6a5.4349.1669185225000000000000 Build id: 5009658b834aaf68970135bfc84f964b66ea4dee Matching build is scylla-5.0.5 0.20221009.5a97a1060 release-x86_64 Downloading relocatable package from http://downloads.scylladb.com/downloads/scylla/relocatable/scylladb-5.0/scylla-x86_64-package-5.0.5.0.20221009.5a97a1060.tar.gz Extracting package scylla-x86_64-package-5.0.5.0.20221009.5a97a1060.tar.gz Cloning scylla.git Downloading scylla-gdb.py Copying scylla-gdb.py from /home/bdenes/ScyllaDB/storage/11961/open-coredump.sh.dir/scylla.repo Launching dbuild container. To examine the coredump with gdb: $ gdb -x scylla-gdb.py -ex 'set directories /src/scylla' --core ./core.scylla.113.bac3650b616f4f09a4d1ab160574b6a5.4349.1669185225000000000000 /opt/scylladb/libexec/scylla See https://github.com/scylladb/scylladb/blob/master/docs/dev/debugging.md for more information on how to debug scylla. Good luck! [root@fedora workdir]# Closes #12223	2022-12-12 12:55:28 +02:00
Kamil Braun	dcba652013	Merge 'replacenode: do not inherit host_id' from Benny Halevy We want to always be able to distinguish between the replacing node and the replacee by using different, unique, host identifiers. This will allow us to use the host_id authoritatively to identify the node (rather then its endpoint ip address) for token mapping and node operations. Also, it will be used in the following patch to never allow the replaced node to rejoin the cluster, as its host_id should never be reused. This change does not affect #5523, the replaced node may still steal back its tokens if restarted. Refs #9839 Refs #12040 Closes #12250 * github.com:scylladb/scylladb: docs: replace-dead-node: update host_id of replacing node docs: replace-dead-node: fix alignment db: system_keyspace: change set_local_host_id to private set_local_random_host_id storage_service: do not inherit the host_id of a replaced a node	2022-12-12 11:00:42 +01:00
Benny Halevy	c6f05b30e1	task_manager: task: impl: add virtual destructor The generic task holds and destroyes a task::impl but we want the derived class's destructor to be called when the task is destroyed otherwise, for example, member like abort_source subscription will not be destroyed (and auto-unlinked). Fixes #12183 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12266	2022-12-11 22:10:59 +02:00
Benny Halevy	36a9f62833	repair: repair_module: use mutable capture for func It is moved into the async thread so the encapsulating function should be defined mutable to move the func rather thna copying it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12267	2022-12-11 22:10:28 +02:00
Nadav Har'El	0c26032e70	test/cql-pytest: translate more Cassandra tests This patch includes a translation of two more test files from Cassandra's CQL unit test directory cql3/validation/operations. All tests included here pass on Cassandra. Several test fail on Scylla and are marked "xfail". These failures discovered two previously-unknown bugs: #12243: Setting USING TTL of "null" should be allowed #12247: Better error reporting for oversized keys during INSERT And also added reproducers for two previously-known bugs: #3882: Support "ALTER TABLE DROP COMPACT STORAGE" #6447: TTL unexpected behavior when setting to 0 on a table with default_time_to_live Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12248	2022-12-11 21:42:57 +02:00
Nadav Har'El	09a3c63345	cross-tree: allow std::source_location in clang 14 We recently (commit `6a5d9ff261`) started to use std::source_location instead of std::experimental::source_location. However, this does not work on clang 14, because libc++ 12's <source_location> only works if __builtin_source_location, and that is not available on clang 14. clang 15 is just three months old, and several relatively-recent distributions still carry clang 14 so it would be nice to support it as well. So this patch adds a trivial compatibility header file, which, when included and compiled with clang 14, it aliases the functional std::experimental::source_location to std::source_location. It turns out it's enough to include the new header file from three headers that included <source_location> - I guess all other uses of source_location depend on those header files directly or indirectly. We may later need to include the compatibility header file in additional places, bug for now we don't. Refs #12259 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12265	2022-12-11 20:28:49 +02:00
Avi Kivity	e6ffc22053	Merge 'cql3: Server-side DESC statement' from Michał Jadwiszczak This PR adds server-side `DESCRIBE` statement, which is required in latest cqlsh version. The only change from the user perspective is the `DESC ...` statement can be used with cqlsh version >= 6.0. Previously the statement was executed from client side, but starting with Cassandra 4.0 and cqlsh 6.0, execution of describe was moved to server side, so the user was unable to do `DESC ...` with Scylla and cqlsh 6.0. Implemented describe statements: - `DESC CLUSTER` - `DESC [FULL] SCHEMA` - `DESC [ONLY] KEYSPACE` - `DESC KEYSPACES/TYPES/FUNCTIONS/AGGREGATES/TABLES` - `DESC TYPE/FUNCTION/AGGREGATE/MATERIALIZED VIEW/INDEX/TABLE` - `DESC` [Cassandra's implementation for reference](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/statements/DescribeStatement.java) Changes in this patch: - cql3::util: added `single_quite()` function - added `data_dictionary::keyspace_element` interface - implemented `data_dictionary::keyspace_element` for: - keyspace_metadata, - UDT, UDF, UDA - schema - cql3::functions: added `get_user_functions()` and `get_user_aggregates()` to get all UDFs/UDAs in specified keyspace - data_dictionary::user_types_metadata: added `has_type()` function - extracted `describe_ring()` from storage_service to standalone helper function in `locator/util.hh` - storage_proxy: added `describe_ring()` (implemented using helper function mentioned above) - extended CQL grammar to handle describe statement - increased version in `version.hh` to 4.0.0, so cqlsh will use server-side describe statement Referring: https://github.com/scylladb/scylla/issues/9571, https://github.com/scylladb/scylladb/issues/11475 Closes #11106 * github.com:scylladb/scylladb: version: Increasing version cql-pytest: Add tests for server-side describe statement cql-pytest: creating random elements for describe's tests cql3: Extend CQL grammar with server-side describe statement cql3:statements: server-side describe statement data_dictonary: add `get_all_keyspaces()` and `get_user_keyspaces()` storage_proxy: add `describe_ring()` method storage_service, locator: extract describe_ring() data_dictionary:user_types_metadata: add has_type() function cql3:functions: `get_user_functions()` and `get_user_aggregates()` implement `keyspace_element` interface data_dictionary: add `keyspace_element` interface cql3: single_quote() util function view: row_lock: lock_ck: reindent test/topology: enable replace tests service/raft: report an error when Raft ID can't be found in `raft_group0::remove_from_group0` service: handle replace correctly with Raft enabled gms/gossiper: fetch RAFT_SERVER_ID during shadow round service: storage_service: sleep 2*ring_delay instead of BROADCAST_INTERVAL before replace	2022-12-11 18:29:36 +02:00
Michał Jadwiszczak	8d88c9721e	version: Increasing version The `current()` version in version.hh has to be increased to at least 4.0.0, so server-side describe will be used. Otherwise, cqlsh returns warning that client-side describe is not supported.	2022-12-10 12:51:05 +01:00
Michał Jadwiszczak	3ddde7c5ad	cql-pytest: Add tests for server-side describe statement	2022-12-10 12:51:05 +01:00
Michał Jadwiszczak	f91d05df43	cql-pytest: creating random elements for describe's tests Add helper functions to create random elements (keyspaces, tables, types) to increase the coverage of describe statment's tests. This commit also adds `random_seed` fixture. The fixture should be always used when using random functions. In case of test's failure, the seed will be present in test's signature and the case can be easili recreated. After the test finishes, the fixture restores state of `random` to before-test state.	2022-12-10 12:51:05 +01:00
Michał Jadwiszczak	c563b2133c	cql3: Extend CQL grammar with server-side describe statement	2022-12-10 12:51:05 +01:00
Michał Jadwiszczak	e572d5f111	cql3:statements: server-side describe statement Starting from cqlsh 6.0.0, execution of the describe statement was moved from the client to the server. This patch implements server-side describe statement. It's done by simply fetching all needed keyspace elements (keyspace/table/index/view/UDT/UDF/UDA) and generating the desired description or list of names of all elements. The description of any element has to respect CQL restrictions(like name's quoting) to allow quickly recreate the schema by simply copy-pasting the descritpion.	2022-12-10 12:51:05 +01:00
Michał Jadwiszczak	673393d88a	data_dictonary: add `get_all_keyspaces()` and `get_user_keyspaces()` Adds functions to `data_dictionary::database` in order to obtain names of all keyspaces/all user keyspaces.	2022-12-10 12:51:05 +01:00
Michał Jadwiszczak	360dbf98f1	storage_proxy: add `describe_ring()` method In order to execute `DESC CLUSTER`, there has to be a way to describe ring. `storage_service` is not available at query execution. This patch adds `describe_ring()` as a method of `storage_proxy()` (using helper function from `locator/util.hh`).	2022-12-10 12:51:05 +01:00
Michał Jadwiszczak	dd46a92e23	storage_service, locator: extract describe_ring() `describe_ring()` was implemented as a method of `storage_service`. This patch extracts it from there to a standalone helper function in `locator/util.hh`.	2022-12-10 12:51:05 +01:00
Michał Jadwiszczak	51a02e3bd7	data_dictionary:user_types_metadata: add has_type() function Adds `has_type()` function to `user_types_metadata`. The functions determins whether UDT with given name exists.	2022-12-10 12:50:52 +01:00
Michał Jadwiszczak	06cd03d3cd	cql3:functions: `get_user_functions()` and `get_user_aggregates()` Helper functions to obtain UDFs/UDAs for certain keyspace.	2022-12-10 12:36:59 +01:00
Michał Jadwiszczak	29ad5a08a8	implement `keyspace_element` interface This patch implements `data_dictionary::keyspace_element` interfece in: `keyspace_metadata`, `user_type_impl`, `user_function`, `user_aggregate` and schema.	2022-12-10 12:34:09 +01:00
Michał Jadwiszczak	f30378819d	data_dictionary: add `keyspace_element` interface A common interace for all keyspace elements, which are: keyspace, UDT, UDF, UDA, tables, views, indexes. The interface is to have a unified way to describe those elements.	2022-12-10 12:27:38 +01:00
Michał Jadwiszczak	0589116991	cql3: single_quote() util function `single_quote()` takes a string and transforms it to a string which can be safely used in CQL commands. Single quoting involves wrapping the name in single-quotes ('). A sigle-quote character itself is quoted by doubling it. Single quoting is necessary for dates, IP addresses or string literals.	2022-12-10 12:27:22 +01:00
Benny Halevy	9c2a5a755f	view: row_lock: lock_ck: reindent Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-10 12:27:22 +01:00
Kamil Braun	c43e64946a	test/topology: enable replace tests Also add some TODOs for enhancing existing tests.	2022-12-10 12:27:22 +01:00
Kamil Braun	b01cba8206	service/raft: report an error when Raft ID can't be found in `raft_group0::remove_from_group0` Also simplify the code and improve logging in general. The previous code did this: search for the ID in the address map. If it couldn't be found, perform a read barrier and search again. If it again couldn't be found, return. This algorithm depended on the fact that IP addresses were stored in group 0 configuration. The read barrier was used to obtain the most recent configuration, and if the IP was not a part of address map after the read barrier, that meant it's simply not a member of group 0. This logic no longer applies so we can simplify the code. Furthermore, when I was fixing the replace operation with Raft enabled, at some point I had a "working" solution with all tests passing. But I was suspicious and checked if the replaced node got removed from group 0. It wasn't. So the replace finished "successfully", but we had an additional (voting!) member of group 0 which didn't correspond to a token ring member. The last version of my fixes ensure that the node gets removed by the replacing node. But the system is fragile and nothing prevents us from breaking this again. At least log an error for now. Regression tests will be added later.	2022-12-10 12:27:22 +01:00
Kamil Braun	c65f4ae875	service: handle replace correctly with Raft enabled We must place the Raft ID obtained during the shadow round in the address map. It won't be placed by the regular gossiping route if we're replacing using the same IP, because we override the application state of the replaced node. Even if we replace a node with a different IP, it is not guaranteed that background gossiping manages to update the address map before we need it, especially in tests where we set ring_delay to 0 and disable wait_for_gossip_to_settle. The shadow round, on the other hand, performs a synchronous request (and if it fails during bootstrap, bootstrap will fail - because we also won't be able to obtain the tokens and Host ID of the replaced node). Fetch the Raft ID of the replaced node in `prepare_replacement_info`, which runs the shadow round. Return it in `replacement_info`. Then `join_token_ring` passes it to `setup_group0`, which stores it in the address map. It does that after `join_group0` so the entry is non-expiring (the replaced node is a member of group 0). Later in the replace procedure, we call `remove_from_group0` for the replaced node. `remove_from_group0` will be able to reverse-translate the IP of the replaced node to its Raft ID using the address map.	2022-12-10 12:27:22 +01:00
Kamil Braun	60217d7f50	gms/gossiper: fetch RAFT_SERVER_ID during shadow round During the replace operation we need the Raft ID of the replaced node. The shadow round is used for fetching all necessary information before the replace operation starts.	2022-12-10 12:27:22 +01:00
Kamil Braun	b424cc40fa	service: storage_service: sleep 2ring_delay instead of BROADCAST_INTERVAL before replace Most of the sleeps related to gossiping are based on `ring_delay`, which is configurable and can be set to lower value e.g. during tests. But for some reason there was one case where we slept for a hardcoded value, `service::load_broadcaster::BROADCAST_INTERVAL` - 60 seconds. Use `2 get_ring_delay()` instead. With the default value of `ring_delay` (30 seconds) this will give the same behavior.	2022-12-10 12:27:22 +01:00
Anna Stuchlik	8d1050e834	docs: replace Scylla with ScyllaDB on the Snitches page	2022-12-09 13:34:18 +01:00
Anna Stuchlik	5cb191d5b0	docs: fix the headings on the Snitches page	2022-12-09 13:26:36 +01:00
Anna Stuchlik	a699904374	doc: add the description of AzureSnitch to the documentation	2022-12-09 13:22:01 +01:00
Nadav Har'El	e47794ed98	test/cql-pytest: regression test for index scan with start token When we have a table with partition key p and an indexed regular column v, the test included in this patch checks the query SELECT p FROM table WHERE v = 1 AND TOKEN(p) > 17 This can work and not require ALLOW FILTERING, because the secondary index posting-list of "v=1" is ordered in p's token order (to allow SELECT with and without an index to return the same order - this is explained in issue #7443). So this test should pass, and indeed it does on both current Scylla, and Cassandra. However, it turns out that this was a bug - issue #7043 - in older versions of Scylla, and only fixed in Scylla 4.6. In older versions, the SELECT wasn't accepted, claiming it requires ALLOW FILTERING, and if ALLOW FILTERING was added, the TOKEN(p) > 17 part was silently ignored. The fix for issue #7043 actually included regression tests, C++ tests in test/boost/secondary_index_test.cc. But in this patch we also add a Python test in test/cql-pytest. One of the benefits of cql-pytest is that we can (and I did) run the same test on Cassandra to verify we're not implementing a wrong feature. Another benefit is that we can run a new test on an old version, and not even require re-compilation: You can run this new test on any existing installation of Scylla to check if it still has issue #7043. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12237	2022-12-09 09:33:16 +02:00
Benny Halevy	018dedcc0c	docs: replace-dead-node: update host_id of replacing node The replacing node no longer assumes the host_id of the replacee. It will continue to use a random, unique host_id. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-09 08:23:31 +02:00
Benny Halevy	37d75e5a21	docs: replace-dead-node: fix alignment	2022-12-09 08:23:31 +02:00
Benny Halevy	89920d47d6	db: system_keyspace: change set_local_host_id to private set_local_random_host_id Now that the local host_id is never changed externally (by the storage_service upon replace-node), the method can be made private and be used only for initializing the local host_id to a random one. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-09 08:23:31 +02:00
Benny Halevy	9942c60d93	storage_service: do not inherit the host_id of a replaced a node We want to always be able to distinguish between the replacing node and the replacee by using different, unique, host identifiers. This will allow us to use the host_id authoritatively to identify the node (rather then its endpoint ip address) for token mapping and node operations. Also, it will be used in the following patch to never allow the replaced node to rejoin the cluster, as its host_id should never be reused. Refs #9839 Refs #12040 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-09 08:23:31 +02:00
Pavel Emelyanov	7197757750	broadcast_tables: Forward-declare storage_proxy in lang.hh Currently the header includes storage_proxy.hh and spreads this over the code via raft_group0_client.hh -> group0_state_machine.hh -> lang.hh Forward declaring proxy class it eliminates ~100 indirect dependencies on storage_proxy.hh via this chain. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12241	2022-12-09 01:23:51 +02:00
Pavel Emelyanov	6075e01312	test/lib: Remove sstable_utils.hh from simple_schema.hh The latter is pretty popular test/lib header that disseminates the former one over whole lot of unit tests. The former, in turn, naturally includes sstables.hh thus making tons of unrelated tests depend on sstables class unused by them. However, simple removal doesn't work, becase of local_shard_only bool class definition in sstable_utils.hh used in simple_schema.hh. This thing, in turn, is used in keys making helpers that don't belong to sstable utils, so these are moved into simple_schema as well. When done, this affects the mutation_source_test.hh, which needs the local_shard_only bool class (and helps spreading the sstables.hh throughout more unrelated tests) and a bunch of .cc test sources that used sstable_utils.hh to indirectly include various headers of their demand. After patching, sstables.hh touches 2x times less tests. As a side effect the sstables_manager.hh also becomes 2x times less dependent on by tests. Continuation of `9bdea110a6` Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12240	2022-12-08 15:37:33 +02:00
Tomasz Grabiec	4e7ddb6309	position_in_partition: Introduce before_key(position_in_partition_view)	2022-12-08 13:41:28 +01:00
Tomasz Grabiec	536c0ab194	db: Fix trim_clustering_row_ranges_to() for non-full keys and reverse order trim_clustering_row_ranges_to() is broken for non-full keys in reverse mode. It will trim the range to position_in_partition_view::after_key(full_key) instead of position_in_partition_view::before_key(key), hence it will include the key in the resulting range rather than exclude it. Fixes #12180 Refs #1446	2022-12-08 13:41:28 +01:00
Tomasz Grabiec	232ce699ab	types: Fix comparison of frozen sets with empty values A frozen set can be part of the clustering key, and with compact storage, the corresponding key component can have an empty value. Comparison was not prepared for this, the iterator attempts to deserialize the item count and will fail if the value is empty. Fixes #12242	2022-12-08 13:41:11 +01:00
Nadav Har'El	4cdaba778d	Merge 'Secondary indexes on static columns' from Piotr Dulikowski This pull request introduces support for global secondary indexes based on static columns. Local secondary indexes based on secondary columns are not planned to be supported and are explicitly forbidden. Because there is only one static row per partition and local indexes require full partition key when querying, such indexes wouldn't be very useful and would only waste resources. The index table for secondary indexes on static columns, unlike other secondary indexes, do not contain clustering keys from the base table. A static column's value determines a set of full partitions, so the clustering keys would only be unnecessary. The already existing logic for querying using secondary indexes works after introducing minimal notifications. The view update generation path now works on a common representation of static and clustering rows, but the new representation allowed to keep most of the logic intact. New cql-pytests are added. All but one of the existing tests for secondary indexes on static columns - ported from Cassandra - now work and have their `xfail` marks lifted; the remaining test requires support for collection indexing, so it will start working only after #2962 is fixed. Materialized view with static rows as a key are __not__ implemented in this PR. Fixes: #2963 Closes #11166 * github.com:scylladb/scylladb: test_materialized_view: verify that static columns are not allowed test_secondary_index: add (currently failing) test for static index paging test_secondary_index: add more tests for secondary indexes on static columns cassandra_tests: enable existing tests for static columns create_index_statement: lift restriction on secondary indexes on static rows db/view: fetch and process static rows when building indexes gms/feature_service: introduce SECONDARY_INDEXES_ON_STATIC_COLUMNS cluster feature create_index_statement: disallow creation of local indexes with static columns select_statement: prepare paging for indexes on static columns select_statement: do not attempt to fetch clustering columns from secondary index's table secondary_index_manager: don't add clustering key columns to index table of static column index replica/table: adjust the view read-before-write to return static rows when needed db/view: process static rows in view_update_builder::on_results db/view: adjust existing view update generation path to use clustering_or_static_row column_computation: adjust to use clustering_or_static_row db/view: add clustering_or_static_row deletable_row: add column_kind parameter to is_live view_info: adjust view_column to accept column_kind db/view: base_dependent_view_info: split non-pk columns into regular and static	2022-12-08 09:54:05 +02:00
Konstantin Osipov	02c30ab5d6	build: fix link error (abseil) on ubuntu toolchain with clang 15 abseil::hash depends on abseil::city and declareds CityHash32 as an external symbol. The city library static library, however, precedes hash in the link list, which apparently makes the linker simply drop it from the object list, since its symbols are not used elsewhere. Fix the linker ordering to help linker see that CityHash32 is used. Closes #12231	2022-12-08 09:47:16 +02:00
Avi Kivity	d6457778f1	Merge 'Coroutinize some table functions in preparation to static compaction groups' from Raphael "Raph" Carvalho Extracted from https://github.com/scylladb/scylladb/pull/12139 Closes #12236 * github.com:scylladb/scylladb: replica: table: Fix indentation replica: coroutinize table::discard_sstables() replica: Coroutinize table::flush()	2022-12-08 09:29:58 +02:00
Piotr Dulikowski	4883e43677	test_materialized_view: verify that static columns are not allowed Adds a test which verifies that static columns are not allowed in materialized views. Although we added support for static columns in secondary indexes, which share a lot of code with materialized views, static columns in materialized views are not yet ready to use.	2022-12-08 07:41:33 +01:00
Piotr Dulikowski	f864944dcb	test_secondary_index: add (currently failing) test for static index paging Currently, when executing queries accelerated by an index on a static column, paging is unable to break base table partitions across pages and is forced to return them in whole. This will cause problems if such a query must return a very large base table partition because it will have to be loaded into memory. Fixing this issue will require a more sophisticated approach than what was done in the PR. For the time being, an xfailing pytest is added which should start passing after paging is improved.	2022-12-08 07:41:33 +01:00
Piotr Dulikowski	4f836115fd	test_secondary_index: add more tests for secondary indexes on static columns Adds cql-pytests which test the secondary index on static columns feature.	2022-12-08 07:41:32 +01:00
Botond Dénes	897b501ba3	Merge 'doc: update the 5.1 upgrade guide with the mode-related information' from Anna Stuchlik This PR adds the link to the KB article about updating the mode after the upgrade to the 5.1 upgrade guide. In addition, I have: - updated the KB article to include the versions affected by that change. - fixed the broken link to the page about metric updates (it is not related to the KB article, but I fixed it in the same PR to limit the number of PRs that need to be backported). Related: https://github.com/scylladb/scylladb/pull/11122 Closes #12148 * github.com:scylladb/scylladb: doc: update the releases in the KB about updating the mode after upgrade doc: fix the broken link in the 5.1 upgrade guide doc: add the link to the 5.1-related KB article to the 5.1 upgrade guide	2022-12-08 07:32:10 +02:00
Tomasz Grabiec	992a73a861	row_cache: Destroy coroutine under region's allocator The reason is alloc-dealloc mismatch of position_in_partition objects allocated by cursors inside coroutine object stored in the update variable in row_cache::do_update() It is allocated under cache region, but in case of exception it will be destroyed under the standard allocator. If update is successful, it will be cleared under region allocator, so there is not problem in the normal case. Fixes #12068 Closes #12233	2022-12-07 21:44:21 +02:00
Raphael S. Carvalho	9ae0d8ba28	replica: table: Fix indentation Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-07 15:53:22 -03:00
Raphael S. Carvalho	b9a33d5a91	replica: coroutinize table::discard_sstables() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-07 15:52:36 -03:00
Raphael S. Carvalho	192b64a5ac	replica: Coroutinize table::flush() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-07 15:52:27 -03:00
Benny Halevy	a076ceef97	view: row_lock: lock_ck: reindent Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-07 19:27:30 +02:00
Avi Kivity	909fbfdd2f	repair: reindent repair_range	2022-12-07 18:17:21 +02:00
Avi Kivity	796ec5996f	repair: coroutinize repair_range	2022-12-07 18:13:10 +02:00
Benny Halevy	78c5961114	docs: operating-scylla: add-node-to-cluster: deleted instructions for unsupported releases 2.3 and 2018.1 ended their life and are long gone. No need to have instructions for them in the master version of this document. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-07 17:07:35 +02:00
Benny Halevy	adeb03e60f	docs: operating-scylla: add-node-to-cluster: cleanup: move tips to a note And be more verbose about why the tips are recommended and their ramifications. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-07 17:07:18 +02:00
Benny Halevy	6e324137bd	docs: operating-scylla: add-node-to-cluster: improve wording of cleanup instructions "use `nodetool cleanup` cleanup command" repeats words, change to "run the `nodetool cleanup` command". Also, improve the description of the cleanup action and how it relate to the bootstrapping process. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-07 17:07:08 +02:00
Benny Halevy	eeed330647	docs: operating-scylla: prerequisites: system_auth is a keyspace, not a table Fix the phrase referring to it as a table respectively. Also, do some minor phrasing touch-ups in this area. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-07 17:06:54 +02:00
Benny Halevy	5d840d4232	docs: operating-scylla: prerequisites: no Authetication status is gathered Authetication status isn't gathered from scylla.yaml, only the authenticator, so change the caption respectively. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-07 17:06:48 +02:00
Benny Halevy	9cb7056d3e	docs: operating-scylla: prerequisites: simplify grep commands Writing `cat X \| grep Y` is both inefficient and somewhat unprofessional. The grep command works very well on a file argument so `grep Y X` will do the job perfectly without the need for a pipe. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-07 17:06:36 +02:00
Benny Halevy	71bc12eecc	docs: operating-scylla: add-node-to-cluster: prerequisites: number sub-sections To improve their readability. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-07 17:06:35 +02:00
Benny Halevy	16db7bea82	docs: operating-scylla: add-node-to-cluster: describe other nodes in plural Typically data will be streamed from multiple existing nodes to the new node, not from a single one. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-07 17:03:23 +02:00
Tomasz Grabiec	a46b2e4e4c	Merge 'Make node replace procedure work with Raft' from Kamil Braun We need to obtain the Raft ID of the replaced node during the shadow round and place it in the address map. It won't be placed by the regular gossiping route if we're replacing using the same IP, because we override the application state of the replaced node. Even if we replace a node with a different IP, it is not guaranteed that background gossiping manages update the address map before we need it, especially in tests where we set ring_delay to 0 and disable wait_for_gossip_to_settle. The shadow round, on the other hand, performs a synchronous request (and if it fails during bootstrap, bootstrap will fail - because we also won't be able to obtain the tokens and Host ID of the replaced node). Fetch the Raft ID of the replaced node in `prepare_replacement_info`, which runs the shadow round. Return it in `replacement_info`. Then `join_token_ring` passes it to `setup_group0`, which stores it in the address map. It does that after `join_group0` so the entry is non-expiring (the replaced node is a member of group 0). Later in the replace procedure, we call `remove_from_group0` for the replaced node. `remove_from_group0` will be able to reverse-translate the IP of the replaced node to its Raft ID using the address map. Also remove an unconditional 60 seconds sleep from the replace code. Make it dependent on ring_delay. Enable the replace tests. Modify some code related to removing servers from group 0 which depended on storing IP addresses in the group 0 configuration. Closes #12172 * github.com:scylladb/scylladb: test/topology: enable replace tests service/raft: report an error when Raft ID can't be found in `raft_group0::remove_from_group0` service: handle replace correctly with Raft enabled gms/gossiper: fetch RAFT_SERVER_ID during shadow round service: storage_service: sleep 2*ring_delay instead of BROADCAST_INTERVAL before replace	2022-12-07 15:30:27 +01:00
Pavel Emelyanov	9bdea110a6	code: Reduce fanout of sstables(_manager)?.hh over headers This change removes sstables.hh from some other headers replacing it with version.hh and shared_sstable.hh. Also this drops sstables_manager.hh from some more headers, because this header propagates sstables.hh via self. That change is pretty straightforward, but has a recochet in database.hh that needs disk-error-handler.hh. Without the patch touch sstables/sstable.hh results in 409 targets recompillation, with the patch -- 299 targets. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12222	2022-12-07 14:34:19 +02:00
Botond Dénes	57a4971962	Merge 'dirty_memory_manager: tidy up' from Avi Kivity Tidy up namespaces, move code to the right file, and move the whole thing to the replica module where it belongs. Closes #12219 * github.com:scylladb/scylladb: dirty_memory_manager: move implementaton from database.cc dirty_memory_manager: move to replica module test: dirty_memory_manager_test: disambiguate classes named 'test_region_group' dirty_memory_manager: stop using using namespace	2022-12-07 14:25:59 +02:00
Avi Kivity	f7f5700289	dirty_memory_manager: move implementaton from database.cc A few leftover method implementations were left in database.cc when dirty_memory_manager.cc was created, move them to their correct place now.	2022-12-06 22:28:54 +02:00
Avi Kivity	444de2831e	dirty_memory_manager: move to replica module It's a replica-side thing, so move it there. The related flush_permit and sstable_write_permit are moved alongside.	2022-12-06 22:24:17 +02:00
Avi Kivity	a038a35ad6	test: dirty_memory_manager_test: disambiguate classes named 'test_region_group' There are two similarly named classes: ::test_region_group and dirty_memory_manager_logalloc::test_region_group. Rename the former to ::raii_region_group (that's what it's for) and the latter to ::test_region_group, to reduce confusion.	2022-12-06 22:20:38 +02:00
Avi Kivity	dfdae5ffa9	dirty_memory_manager: stop using using namespace `using namespace` is pretty bad, especially in a header, as it pollutes the namespace for everyone. Stop using it and qualify names instead.	2022-12-06 21:37:38 +02:00
Avi Kivity	47a8fad2a2	Merge 'scylla-types: add serialize action' from Botond Dénes Serializes the value that is an instance of a type. The opposite of `deserialize` (previously known as `print`). All other actions operate on serialized values, yet up to now we were missing a way to go from human readable values to serialized ones. This prevented for example using `scylla types tokenof $pk` if one only had the human readable key value. Example: ``` $ scylla types serialize -t Int32Type -- -1286905132 b34b62d4 $ scylla types serialize --prefix-compound -t TimeUUIDType -t Int32Type -- d0081989-6f6b-11ea-0000-0000001c571b 16 0010d00819896f6b11ea00000000001c571b000400000010 $ scylla types serialize --prefix-compound -t TimeUUIDType -t Int32Type -- d0081989-6f6b-11ea-0000-0000001c571b 0010d00819896f6b11ea00000000001c571b ``` Closes #12029 * github.com:scylladb/scylladb: docs: scylla-types.rst: add mention of per-operation --help tools/scylla-types: add serialize operation tools/scylla-types: prepare for action handlers with string arguments tools/scylla-types: s/print/deserialize/ operation docs: scylla-types.rst: document tokenof and shardof docs: scylla-types.rst: fix typo in compare operation description	2022-12-06 19:27:15 +02:00
Nadav Har'El	f275bfd57b	Update CODEOWNERS file Update the CODEOWNERS file with some people who joined different parts of the project, and one person that left. Note that despite is name, CODEOWNERS does not list "ownership" in any strict sense of the word - it is more about who is willing and/or knowledgeable enough to participate in reviewing changes to particular files or directories. Github uses this file to automatically suggest who should review a pull request. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12216	2022-12-06 19:26:03 +02:00
Benny Halevy	5007ded2c1	view: row_lock: lock_ck: serialize partition and row locking The problematic scenario this patch fixes might happen due to unfortunate serialization of locks/unlocks between lock_pk and lock_ck, as follows: 1. lock_pk acquires an exclusive lock on the partition. 2.a lock_ck attempts to acquire shared lock on the partition and any lock on the row. both cases currently use a fiber returning a future<rwlock::holder>. 2.b since the partition is locked, the lock_partition times out returning an exceptional future. lock_row has no such problem and succeeds, returning a future holding a rwlock::holder, pointing to the row lock. 3.a the lock_holder previously returned by lock_pk is destroyed, calling `row_locker::unlock` 3.b row_locker::unlock sees that the partition is not locked and erases it, including the row locks it contains. 4.a when_all_succeeds continuation in lock_ck runs. Since the lock_partition future failed, it destroyes both futures. 4.b the lock_row future is destroyed with the rwlock::holder value. 4.c ~holder attempts to return the semaphore units to the row rwlock, but the latter was already destroyed in 3.b above. Acquiring the partition lock and row lock in parallel doesn't help anything, but it complicates error handling as seen above, This patch serializes acquiring the row lock in lock_ck after locking the partition to prevent the above race. This way, erasing the unlocked partition is never expected to happen while any of its rows locks is held. Fixes #12168 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12208	2022-12-06 16:29:46 +02:00
Botond Dénes	f017e9f1c6	docs: document the reader concurrency semaphore diagnostics dump The diagnostics dumped by the reader concurrency semaphore are pretty common-sight in logs, as soon as a node becomes problematic. The reason is that the reader concurrency semaphore acts as the canary in the coal mine: it is the first that starts screaming when the node or workload is unhealthy. This patch adds documentation of the content of the diagnostics and how to diagnose common problems based on it. Fixes: #10471 Closes #11970	2022-12-06 16:24:44 +02:00
Botond Dénes	c35cee7e2b	docs: scylla-types.rst: add mention of per-operation --help	2022-12-06 14:47:28 +02:00
Botond Dénes	4f9799ce4f	tools/scylla-types: add serialize operation Takes human readable values and converts them to serialized hex encoded format. Only regular atomic types are supported for now, no collection/UDT/tuple support, not even in frozen form.	2022-12-06 14:46:53 +02:00
Botond Dénes	7c87655b4b	tools/scylla-types: prepare for action handlers with string arguments Currently all action handlers have bytes arguments, parsed from hexadecimal string representations. We plan on adding a serialize command which will require raw string arguments. Prepare the infrastructure for supporting both types of action handlers.	2022-12-06 14:45:30 +02:00
Botond Dénes	15452730fb	tools/scylla-types: s/print/deserialize/ operation Soon we will have a serialize operation. Rename the current print operation to deserialize in preparation to that. We want the two operations (serialize and deserialize) to reflect their relation in their names too.	2022-12-06 14:45:30 +02:00
Botond Dénes	f98e6552b4	docs: scylla-types.rst: document tokenof and shardof These new actions were added recently but without the accompanying documentation change. Make up for this now.	2022-12-06 14:45:30 +02:00
Botond Dénes	30c047cae6	docs: scylla-types.rst: fix typo in compare operation description	2022-12-06 14:45:23 +02:00
Piotr Dulikowski	680423ad9d	cassandra_tests: enable existing tests for static columns Removes the "xfail" marker from the now-passing tests related to secondary indexes on static columns.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	cc3af3190d	create_index_statement: lift restriction on secondary indexes on static rows Secondary indexes on static columns should work now. This commit lifts the existing restriction after the cluster is fully upgraded to a version which supports such indexes.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	86dad30b66	db/view: fetch and process static rows when building indexes This commit modifies the view builder and its consumer so that static rows are always fetched and properly processed during view build. Currently, the view builder will always fetch both static and clustering rows, regardless of the type of indexes being built. For indexes on static columns this is wasteful and could be improved so that only the types of rows relevant to indexes being built are fetched - however, doing this sounds a bit complicated and I would rather start with something simpler which has a better chance of working.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	25fec0acce	gms/feature_service: introduce SECONDARY_INDEXES_ON_STATIC_COLUMNS cluster feature The new feature will prevent secondary indexes on static columns from being created unless the whole cluster is ready to support them.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	9f14f0ac09	create_index_statement: disallow creation of local indexes with static columns Local indexes on static columns don't make sense because there is only one static row per partition. It's always better to just run SELECT DISTINCT on the base table. Allowing for such an index would only make such queries slower (due to double lookup), would take unnecessary space and could pose potential consistency problems, so this commit explicitly forbids them.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	8c4cdfc2db	select_statement: prepare paging for indexes on static columns When performing a query on a table which is accelerated by a secondary index, the paging state returned along with the query contains a partition key and a clustering key of the secondary index table. The logic wasn't prepared to handle the case of secondary indexes on static columns - notably, it tried to put base table's clustering key columns into the paging state which caused problems in other places. This commit fixes the paging logic so that the PK and CK of a secondary index table is calculated correctly. However, this solution has a major drawback: because it is impossible to encode clustering key of the base table in the paging state, partitions returned by queries accelerated by secondary indexes on static columns will _not_ be split by paging. This can be problematic in case there are large partitions in the base table. The main advantage of this fix is that it is simple. Moreover, the problem described above is not unique to static column indexes, but also happens e.g. in case of some indexes on clustering columns (see case 2 of scylladb/scylla#7432). Fixing this issue will require a more sophisticated solution and may affect more than only secondary indexes on static columns, so this is left for a followup.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	ba390072c5	select_statement: do not attempt to fetch clustering columns from secondary index's table The previous commit made sure that the index table for secondary indexes on static tables don't have columns corresponding to clustering rows in the base table - therefore, we must make sure that we don't try to fetch them when querying the index table.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	983b440a81	secondary_index_manager: don't add clustering key columns to index table of static column index The implementation of secondary indexes on static columns relies on the fact that the index table only includes partition key columns of the base table, but not clustering key columns. A static column's value determines a set of full partitions, so including the clustering key would only be redundant. It would also generate more work as a single static column update would require a large portion of the index to be updated. This commit makes sure that clustering columns are not included in the index table for indexes based on a static column.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	6ab41d76e6	replica/table: adjust the view read-before-write to return static rows when needed Adjusts the read-before-write query issued in `table::do_push_view_replica_updates` so that, when needed, requests static columns and makes sure that the static row is present.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	18be90b1e6	db/view: process static rows in view_update_builder::on_results The `view_update_builder::on_results()` function is changed to react to static rows when comparing read-before-write results with the base table mutation.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	2dd95d76f1	db/view: adjust existing view update generation path to use clustering_or_static_row The view update path is modified to use `clustering_or_static_row` instead of just `clustering_row`.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	b0a31bb7a7	column_computation: adjust to use clustering_or_static_row Adjusts the column_computation interface so that it is able to accept both clustering and static rows through the common db::view::clustering_or_static_row interface.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	986ab6034c	db/view: add clustering_or_static_row Adds a `clustering_or_static_row`, which is a common, immutable representation of either a static or clustering row. It will allow to handle view update generation based on static or clustering rows in a uniform way.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	05d4328f02	deletable_row: add column_kind parameter to is_live While deletable_row is used to hold regular columns of a clustering row, its name or implementation doesn't suggest that it is a requirement. In fact, some of its methods already take a column_kind parameter which is used to interpret the kind of columns held in the row. This commit removes the assumption about the column kind from the `deletable_row::is_live` method.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	27c81432cd	view_info: adjust view_column to accept column_kind The `view_info::view_column()` and `view_column` in view.cc allow to get a view's column definition which corresponds to given base table's column. They currently assume that the given column id corresponds to a regular column. In preparation for secondary indexes based on static columns, this commit adjusts those functions so that they accept other kinds of columns, including static columns.	2022-12-06 11:21:16 +01:00
Piotr Dulikowski	f7b7724eaf	db/view: base_dependent_view_info: split non-pk columns into regular and static Currently, `base_dependent_view_info::_base_non_pk_columns_in_view_pk` field keeps a list of non-primary-key columns from the base table which are a part of the view's primary key. Because the current code does not allow indexes on static columns yet, the columns kept in the aforementioned field are always assumed to be regular columns of the base table and are kept as `column_id`s which do not contain information about the column kind. This commit splits the `_base_non_pk_columns_in_view_pk` field into two, one for regular columns and the other for static columns, so that it is possible to keep both kinds of columns in `base_dependent_view_info` and the structure can be used for secondary indexes on static columns.	2022-12-06 11:21:16 +01:00
Botond Dénes	681bd62424	Update tools/java submodule * tools/java ecab7cf7d6...1c4e1e7a7d (2): > Merge "Cqlsh serverless v2" from Karol Baryla > Update Java Driver version to 3.11.2.4	2022-12-06 09:06:09 +02:00
Botond Dénes	6a1dbffaaa	Merge 'compaction_manager: coroutinize postponed_compactions_reevaluation' from Avi Kivity Three lambdas were removed, simplifying the code. Closes #12207 * github.com:scylladb/scylladb: compaction_manager: reindent postponed_compactions_reevaluation() compaction_manager: coroutinize postponed_compactions_reevaluation() compaction_manager: make postponed_compactions_reevaluation() return a future	2022-12-06 08:08:36 +02:00
Avi Kivity	2339a3fa06	database: remove continuation for updating statistics update_write_metrics() is a continuation added solely for updating statistics. Fold it into do_update to reduce an allocation in the write path. ```console $ ./artifacts/before --write --smp 1 2<&1 \| grep insn 189930.77 tps ( 57.2 allocs/op, 13.2 tasks/op, 50994 insns/op, 0 errors) 189954.18 tps ( 57.2 allocs/op, 13.2 tasks/op, 51086 insns/op, 0 errors) 188623.86 tps ( 57.2 allocs/op, 13.2 tasks/op, 51083 insns/op, 0 errors) 190115.01 tps ( 57.2 allocs/op, 13.2 tasks/op, 51092 insns/op, 0 errors) 190173.71 tps ( 57.2 allocs/op, 13.2 tasks/op, 51083 insns/op, 0 errors) median 189954.18 tps ( 57.2 allocs/op, 13.2 tasks/op, 51086 insns/op, 0 errors) ``` vs ```console $ ./artifacts/after --write --smp 1 2<&1 \| grep insn 190358.38 tps ( 56.2 allocs/op, 12.2 tasks/op, 50754 insns/op, 0 errors) 185222.78 tps ( 56.2 allocs/op, 12.2 tasks/op, 50789 insns/op, 0 errors) 184508.09 tps ( 56.2 allocs/op, 12.2 tasks/op, 50842 insns/op, 0 errors) 142099.47 tps ( 56.2 allocs/op, 12.2 tasks/op, 50825 insns/op, 0 errors) 190447.22 tps ( 56.2 allocs/op, 12.2 tasks/op, 50811 insns/op, 0 errors) ``` One allocation and ~300 cycles saved. update_write_metrics() is still called from other call sites, so it is not removed. Closes #12108	2022-12-06 07:04:17 +02:00
Botond Dénes	6daa1e973f	Merge 'alternator: fix hangs related to TTL scanning' from Nadav Har'El The first patch in this small series fixes a hang during shutdown when the expired-item scanning thread can hang in a retry loop instead of quitting. These hangs were seen in some test runs (issue #12145). The second patch is a failsafe against additional bugs like those solved by the first patch: If any bugs causes the same page fetch to repeatedly time out, let's stop the attempts after 10 retries instead of retrying for ever. When we stop the retries, a warning will be printed to the log, Scylla will wait until the next scan period and start a new scan from scratch - from a random position in the database, instead of hanging potentially-forever waiting for the same page. Closes #12152 * github.com:scylladb/scylladb: alternator ttl: in scanning thread, don't retry the same page too many times alternator: fix hang during shutdown of expiration-scanning thread	2022-12-06 06:44:22 +02:00
Botond Dénes	c5da96e6f7	Merge 'cql3: batch_statement: coroutinize get_mutations()' from Avi Kivity As it has a do_with(), coroutinizing it is an automatic win. Closes #12195 * github.com:scylladb/scylladb: cql3: batch_statement: reindent get_mutations() cql3: batch_statement: coroutinize get_mutations()	2022-12-06 06:41:44 +02:00
Avi Kivity	d2b1d2f695	compaction_manager: reindent postponed_compactions_reevaluation()	2022-12-05 22:02:27 +02:00
Avi Kivity	1669025736	compaction_manager: coroutinize postponed_compactions_reevaluation() So much nicer.	2022-12-05 22:01:41 +02:00
Avi Kivity	d2c44cba77	compaction_manager: make postponed_compactions_reevaluation() return a future postponed_compactions_reevaluation() runs until compaction_manager is stopped, checking if it needs to launch new compactions. Make it return a future instead of stashing its completion somewhere. This makes is easier to convert it to a coroutine.	2022-12-05 21:58:48 +02:00
Avi Kivity	fe4d7fbdf2	Update abseil submodule * abseil 7f3c0d78...4e5ff155 (125): > Add a compilation test for recursive hash map types > Add AbslStringify support for enum types in Substitute. > Use a c++14-style constexpr initialization if c++14 constexpr is available. > Move the vtable into a function to delay instantiation until the function is called. When the variable is a global the compiler is allowed to instantiate it more aggresively and it might happen before the types involved are complete. When it is inside a function the compiler can't instantiate it until after the functions are called. > Cosmetic reformatting in a test. > Reorder base64 unescape methods to be below the escaping methods. > Fixes many compilation issues that come from having no external CI coverage of the accelerated CRC implementation and some differences bewteen the internal and external implementation. > Remove static initializer from mutex.h. > Import of CCTZ from GitHub. > Remove unused iostream include from crc32c.h > Fix MSVC builds that reject C-style arrays of size 0 > Remove deprecated use of absl::ToCrc32c() > CRC: Make crc32c_t as a class for explicit control of operators > Convert the full parser into constexpr now that Abseil requires C++14, and use this parser for the static checker. This fixes some outstanding bugs where the static checker differed from the dynamic one. Also, fix `%v` to be accepted with POSIX syntax. > Write (more) directly into the structured buffer from StringifySink, including for (size_t, char) overload. > Avoid using the non-portable type __m128i_u. > Reduce flat_hash_{set,map} generated code size. > Use ABSL_HAVE_BUILTIN to fix -Wundef __has_builtin warning > Add a TODO for the deprecation of absl::aligned_storage_t > TSAN: Remove report_atomic_races=0 from CI now that it has been fixed > absl: fix Mutex TSan annotations > CMake: Remove trailing commas in `AbseilDll.cmake` > Fix AMD cpu detection. > CRC: Get CPU detection and hardware acceleration working on MSVC x86(_64) > Removing trailing period that can confuse a url in str_format.h. > Refactor btree iterator generation code into a base class rather than using ifdefs inside btree_iterator. > container.h: fix incorrect comments about the location of <numeric> algorithms. > Zero encoded_remaining when a string field doesn't fit, so that we don't leave partial data in the buffer (all decoders should ignore it anyway) and to be sure that we don't try to put any subsequent operands in either (there shouldn't be enough space). > Improve error messages when comparing btree iterators when generations are enabled. > Document the WebSafe* and WithPadding variants more concisely, as deltas from Base64Encode. > Drop outdated comment about LogEntry copyability. > Release structured logging. > Minor formatting changes in preparation for structured logging... > Update absl::make_unique to reflect the C++14 minimum > Update Condition to allocate 24 bytes for MSVC platform pointers to methods. > Add missing include > Refactor "RAW: " prefix formatting into FormatLogPrefix. > Minor formatting changes due to internal refactoring > Fix typos > Add a new API for `extract_and_get_next()` in b-tree that returns both the extracted node and an iterator to the next element in the container. > Use AnyInvocable in internal thread_pool > Remove absl/time/internal/zoneinfo.inc. It was used to guarantee availability of a few timezones for "time_test" and "time_benchmark", but (file-based) zoneinfo is now secured via existing Bazel data/env attributes, or new CMake environment settings. > Updated documentation on use of %v Also updated documentation around FormatSink and PutPaddedString > Use the correct Bazel copts in crc targets > Run the //absl/time timezone tests with a data dependency on, and a matching ${TZDIR} setting for, //absl/time/internal/cctz:zoneinfo. > Stop unnecessary clearing of fields in ~raw_hash_set. > Fix throw_delegate_test when using libc++ with shared libraries > CRC: Ensure SupportsArmCRC32PMULL() is defined > Improve error messages when comparing btree iterators. > Refactor the throw_delegate test into separate test cases > Replace std::atomic_flag with std::atomic<bool> to avoid the C++20 deprecation of ATOMIC_FLAG_INIT. > Add support for enum types with AbslStringify > Release the CRC library > Improve error messages when comparing swisstable iterators. > Auto increase inlined capacity whenever it does not affect class' size. > drop an unused dep > Factor out the internal helper AppendTruncated, which is used and redefined in a couple places, plus several more that have yet to be released. > Fix some invalid iterator bugs in btree_test.cc for multi{set,map} emplace{_hint} tests. > Force a conservative allocation for pointers to methods in Condition objects. > Fix a few lint findings in flags' usage.cc > Narrow some _MSC_VER checks to not catch clang-cl. > Small cleanups in logging test helpers > Import of CCTZ from GitHub. > Merge pull request abseil/abseil-cpp#1287 from GOGOYAO:patch-1 > Merge pull request abseil/abseil-cpp#1307 from KindDragon:patch-1 > Stop disabling some test warnings that have been fixed > Support logging of user-defined types that implement `AbslStringify()` > Eliminate span_internal::Min in favor of std::min, since Min conflicts with a macro in a third-party library. > Fix -Wimplicit-int-conversion. > Improve error messages when dereferencing invalid swisstable iterators. > Cord: Avoid leaking a node if SetExpectedChecksum() is called on an empty cord twice in a row. > Add a warning about extract invalidating iterators (not just the iterator of the element being extracted). > CMake: installed artifacts reflect the compiled ABI > Import of CCTZ from GitHub. > Import of CCTZ from GitHub. > Support empty Cords with an expected checksum > Move internal details from one source file to another more appropriate source file. > Removes `PutPaddedString()` function > Return uint8_t from CappedDamerauLevenshteinDistance. > Remove the unknown CMAKE_SYSTEM_PROCESSOR warning when configuring ABSL_RANDOM_RANDEN_COPTS > Enforce Visual Studio 2017 (MSVC++ 15.0) minumum > `absl::InlinedVector::swap` supports non-assignable types. > Improve b-tree error messages when dereferencing invalid iterators. > Mutex: Fix stall on single-core systems > Document Base64Unescape() padding > Fix sign conversion warnings in memory_test.cc. > Fix a sign conversion warning. > Fix a truncation warning on Windows 64-bit. > Use btree iterator subtraction instead of std::distance in erase_range() and count(). > Eliminate use of internal interfaces and make the test portable and expose it to OSS. > Fix various warnings for _WIN32. > Disables StderrKnobsDefault due to order dependency > Implement btree_iterator::operator-, which is faster than std::distance for btree iterators. > Merge pull request abseil/abseil-cpp#1298 from rpjohnst:mingw-cmake-build > Implement function to calculate Damerau-Levenshtein distance between two strings. > Change per_thread_sem_test from size medium to size large. > Support stringification of user-defined types in AbslStringify in absl::Substitute. > Fix "unsafe narrowing" warnings in absl, 12/12. > Revert change to internal 'Rep', this causes issues for gdb > Reorganize InlineData into an inner Rep structure. > Remove internal `VLOG_xxx` macros > Import of CCTZ from GitHub. > `absl::InlinedVector` supports move assignment with non-assignable types. > Change Cord internal layout, which reduces store-load penalties on ARM > Detects accidental multiple invocations of AnyInvocable<R(...)&&>::operator()&& by producing an error in debug mode, and clarifies that the behavior is undefined in the general case. > Fix a bug in StrFormat. This issue would have been caught by any compile-time checking but can happen for incorrect formats parsed via ParsedFormat::New. Specifically, if a user were to add length modifiers with 'v', for example the incorrect format string "%hv", the ParsedFormat would incorrectly be allowed. > Adds documentation for stringification extension > CMake: Remove check_target calls which can be problematic in case of dependency cycle > Changes mutex unlock profiling > Add static_cast<void> to the sources for trivial relocations to avoid spurious -Wdynamic-class-memaccess errors in the presence of other compilation errors. > Configure ABSL_CACHE_ALIGNED for clang-like and MSVC toolchains. > Fix "unsafe narrowing" warnings in absl, 11/n. > Eliminate use of internal interfaces > Merge pull request abseil/abseil-cpp#1289 from keith:ks/fix-more-clang-deprecated-builtins > Merge pull request abseil/abseil-cpp#1285 from jun-sheaf:patch-1 > Delete LogEntry's copy ctor and assignment operator. > Make sinks provided to `AbslStringify()` usable with `absl::Format()`. > Cast unused variable to void > No changes in OSS. > No changes in OSS > Replace the kPower10ExponentTable array with a formula. > CMake: Mark absl::cord_test_helpers and absl::spy_hash_state PUBLIC > Use trivial relocation for transfers in swisstable and b-tree. > Merge pull request abseil/abseil-cpp#1284 from t0ny-peng:chore/remove-unused-class-in-variant > Removes the legacy spellings of the thread annotation macros/functions by default. Closes #12201	2022-12-05 21:07:16 +02:00
Eliran Sinvani	5a5514d052	cql server: Only parallelize relevant cql requests The cql server uses an execution stage to process and execute queries, however, processing stage is best utilized when having a recurrent flow that needs to be called repeatedly since it better utilizes the instruction cache. Up until now, every request was sent through the processing stage, but most requests are not meant to be executed repeatedly with high volume. This change processes and executes the data queries asynchronously, through an execution stage, and all of the rest are processed one by one, only continuing once the request has been done end to end. Tests: Unit tests in dev and debug. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Closes #12202	2022-12-05 21:06:58 +02:00
Takuya ASADA	b7851ab1ec	docker: fix locale on SSH shell `4ecc08c` broke locale settings on SSH shell, since we dropped "update-locale". To fix this without installing locales package, we need to manually specify LANG=C.UTF-8 in /etc/default/locale. see https://github.com/scylladb/scylla-cluster-tests/pull/5519 Closes #12197	2022-12-05 20:02:18 +02:00
Avi Kivity	6f2d060d12	Merge 'Make sstable_directory call sstable_manager for sstables' components' from Pavel Emelyanov This PR hits two goals for "object storage" effort 1. Sstables loader "knows" that sstables components are stored in a Linux directory and uses utils/lister to access it. This is not going to work with sstables over object storage, the loader should be abstracted from the underlying storage. 2. Currently class keyspace and class column_family carry "datadir" and "all_datadirs" on board which are path on local filesystem where sstable files are stored (those usually started with /var/lib/scylla/data). The paths include subsdirs like "snapshots", "staging", etc. This is not going to look nice for obejct storage, the /var/lib/ prefix is excessive and meaningless in this case. Instead, ks and cf should know their "location" and some other component should know the directory where in which the files are stored. Said that, this PR prepares distributed_loader and sstables_directly to stop using Linux paths explicitly by making both call sstables_manager to list and open sstables object. After it will be possible to teach manager to list sstables from object storage. Also this opens the way to removing paths from keyspace and column_family classes and replacing those with relative "location"s. Closes #12128 * github.com:scylladb/scylladb: sstable_directory: Get components lister from manager sstable_directory: Extract directory lister sstable_directory: Remove sstable creation callback sstable_directory: Call manager to make sstables sstable_directory: Keep error handler generator sstable_directory: Keep schema_ptr sstable_directory: Use directory semaphore from manager sstable_directory: Keep reference on manager tests: Use sstables creation helper in some cases sstables_manager: Keep directory semaphore reference sstables, code: Wrap directory semaphore with concurrency	2022-12-05 18:54:17 +02:00
Gleb Natapov	022a825b33	raft: introduce not_a_member error and return it when non member tries to do add/modify_config Currently if a node that is outside of the config tries to add an entry or modify config transient error is returned and this causes the node to retry. But the error is not transient. If a node tries to do one of the operations above it means it was part of the cluster at some point, but since a node with the same id should not be added back to a cluster if it is not in the cluster now it will never be. Return a new error not_a_member to a caller instead. Message-Id: <Y42mTOx8bNNrHqpd@scylladb.com>	2022-12-05 17:11:04 +01:00
Benny Halevy	c61083852c	storage_service: handle_state_normal: calculate candidates_for_removal when replacing tokens We currently try to detect a replaced node so to insert it to endpoints_to_remove when it has no owned tokens left. However, for each token we first generate a multimap using get_endpoint_to_token_map_for_reading(). There are 2 problems with that: 1. unless the replaced node owns a single token, this map will not be empty after erasing one token out of it, since the token metadata has not changed yet (this is done later with update_normal_tokens(owned_tokens, endpoint)). 2. generating this map for each token is inefficient, turning this algorithm complexity to quadratic in the number of tokens... This change copies the current token_to_endpoint map to temporary map and erases replaced tokens from it, while maintaining a set of candidates_for_removal. After traversing all replaced tokens, we check again the `token_to_endpoint_map` erasing from `candidates_for_removal` any endpoint that still owns tokens. The leftover candidates are endpoints the own no tokens and so they are added to `hosts_to_remove`. Fixes #12082 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12141	2022-12-05 16:17:18 +01:00
Botond Dénes	3d620378d4	Merge 'view: coroutinize maybe_mark_view_as_built' from Avi Kivity Simplifying it a little. Closes #12171 * github.com:scylladb/scylladb: view: reindent maybe_mark_view_as_built view: coroutinize maybe_mark_view_as_built	2022-12-05 13:43:34 +02:00
Kamil Braun	3f8aaeeab9	test/topology: enable replace tests Also add some TODOs for enhancing existing tests.	2022-12-05 11:50:07 +01:00
Kamil Braun	ee19411783	service/raft: report an error when Raft ID can't be found in `raft_group0::remove_from_group0` Also simplify the code and improve logging in general. The previous code did this: search for the ID in the address map. If it couldn't be found, perform a read barrier and search again. If it again couldn't be found, return. This algorithm depended on the fact that IP addresses were stored in group 0 configuration. The read barrier was used to obtain the most recent configuration, and if the IP was not a part of address map after the read barrier, that meant it's simply not a member of group 0. This logic no longer applies so we can simplify the code. Furthermore, when I was fixing the replace operation with Raft enabled, at some point I had a "working" solution with all tests passing. But I was suspicious and checked if the replaced node got removed from group 0. It wasn't. So the replace finished "successfully", but we had an additional (voting!) member of group 0 which didn't correspond to a token ring member. The last version of my fixes ensure that the node gets removed by the replacing node. But the system is fragile and nothing prevents us from breaking this again. At least log an error for now. Regression tests will be added later.	2022-12-05 11:50:07 +01:00
Kamil Braun	4429885543	service: handle replace correctly with Raft enabled We must place the Raft ID obtained during the shadow round in the address map. It won't be placed by the regular gossiping route if we're replacing using the same IP, because we override the application state of the replaced node. Even if we replace a node with a different IP, it is not guaranteed that background gossiping manages to update the address map before we need it, especially in tests where we set ring_delay to 0 and disable wait_for_gossip_to_settle. The shadow round, on the other hand, performs a synchronous request (and if it fails during bootstrap, bootstrap will fail - because we also won't be able to obtain the tokens and Host ID of the replaced node). Fetch the Raft ID of the replaced node in `prepare_replacement_info`, which runs the shadow round. Return it in `replacement_info`. Then `join_token_ring` passes it to `setup_group0`, which stores it in the address map. It does that after `join_group0` so the entry is non-expiring (the replaced node is a member of group 0). Later in the replace procedure, we call `remove_from_group0` for the replaced node. `remove_from_group0` will be able to reverse-translate the IP of the replaced node to its Raft ID using the address map.	2022-12-05 11:50:07 +01:00
Kamil Braun	45bb5bfb52	gms/gossiper: fetch RAFT_SERVER_ID during shadow round During the replace operation we need the Raft ID of the replaced node. The shadow round is used for fetching all necessary information before the replace operation starts.	2022-12-05 11:50:07 +01:00
Kamil Braun	7222c2f9a1	service: storage_service: sleep 2ring_delay instead of BROADCAST_INTERVAL before replace Most of the sleeps related to gossiping are based on `ring_delay`, which is configurable and can be set to lower value e.g. during tests. But for some reason there was one case where we slept for a hardcoded value, `service::load_broadcaster::BROADCAST_INTERVAL` - 60 seconds. Use `2 get_ring_delay()` instead. With the default value of `ring_delay` (30 seconds) this will give the same behavior.	2022-12-05 11:50:07 +01:00
Pavel Emelyanov	b5ede873f2	sstable_directory: Get components lister from manager For now this is almost a no-op because manager just calls sstables_directory code back to create the lister. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-05 12:03:19 +03:00
Pavel Emelyanov	3f9b8c855d	sstable_directory: Extract directory lister Currently the utils/lister.cc code is in use to list regular files in a directory. This patch wraps the lister into more abstract components lister class. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-05 12:03:19 +03:00
Pavel Emelyanov	abd3602b10	sstable_directory: Remove sstable creation callback It's no longer used. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-05 12:03:19 +03:00
Pavel Emelyanov	3d559391df	sstable_directory: Call manager to make sstables Now the directory code has everyhting it needs to create sstable object and can stop using the external lambda. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-05 12:03:19 +03:00
Pavel Emelyanov	db657a8d1c	sstable_directory: Keep error handler generator Yet another continuation to previous patch -- IO error handlers generator is also needed to create sstables. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-05 12:03:19 +03:00
Pavel Emelyanov	4281f4af42	sstable_directory: Keep schema_ptr Continuation of one-before-previous patch. In order to create sstable without external lambda the directory code needs schema. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-05 12:03:19 +03:00
Pavel Emelyanov	8df1bcb907	sstable_directory: Use directory semaphore from manager After previous patch sstables_directory code may no longer require for semaphore argument, because it can get one from manager. This makes the directory API shorter and simpler. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-05 12:03:19 +03:00
Pavel Emelyanov	4da941e159	sstable_directory: Keep reference on manager The sstables_directly accesses /var/lib/scylla/data in two ways -- lists files in it and opens sstables. The latter is abdtracted with the help of lambdas passed around, but the former (listing) is done by using directory liters from utils. Listing sstables components with directlry lister won't work for object storage, the directory code will need to call some abstraction layer instead. Opening sstables with the help of a lambda is a bit of overkill, having sstables manager at hand could make it much simpler. Said that, this patch makes sstables_directly reference sstables_manager on start. This change will also simplify directory semaphore usage (next patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-05 12:03:19 +03:00
Pavel Emelyanov	784d78810a	tests: Use sstables creation helper in some cases Several test cases push sstables creation lambda into with_sstables_directory helper. There's a ready to use helper class that does the same. Next patch will make additional use of that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-05 12:03:19 +03:00
Pavel Emelyanov	5e13ce2619	sstables_manager: Keep directory semaphore reference Preparational patch. The semaphore will be used by sstables_directory in next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-05 12:03:18 +03:00
Pavel Emelyanov	be8512d7cc	sstables, code: Wrap directory semaphore with concurrency Currently this is a sharded<semaphore> started/stopped in main and referenced by database in order to be fed into sstables code. This semaphore always comes with the "concurrency" parameter that limits the parallel_for_each parallelizm. This patch wraps both together into directory_semaphore class. This makes its usage simpler and will allow extending it in the future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-05 11:59:30 +03:00
Asias He	c6087cf3a0	repair: Reduce repair reader eviction with diff shard count When repair master and followers have different shard count, the repair followers need to create multi-shard readers. Each multi-shard reader will create one local reader on each shard, N (smp::count) local readers in total. There is a hard limit on the number of readers who can work in parallel. When there are more readers than this limit. The readers will start to evict each other, causing buffers already read from disk to be dropped and recreating of readers, which is not very efficient. To optimize and reduce reader eviction overhead, a global reader permit is introduced which considers the multi-shard reader bloats. With this patch, at any point in time, the number of readers created by repair will not exceed the reader limit. Test Results: 1) with stream sem 10, repair global sem 10, 5 ranges in parallel, n1=2 shards, n2=8 shards, memory wanted =1 1.1) [asias@hjpc2 mycluster]$ time nodetool -p 7200 repair ks2 (repair on n2) [2022-11-23 17:45:24,770] Starting repair command #1, repairing 1 ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true) [2022-11-23 17:45:53,869] Repair session 1 [2022-11-23 17:45:53,869] Repair session 1 finished real 0m30.212s user 0m1.680s sys 0m0.222s 1.2) [asias@hjpc2 mycluster]$ time nodetool repair ks2 (repair on n1) [2022-11-23 17:46:07,507] Starting repair command #1, repairing 1 ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true) [2022-11-23 17:46:30,608] Repair session 1 [2022-11-23 17:46:30,608] Repair session 1 finished real 0m24.241s user 0m1.731s sys 0m0.213s 2) with stream sem 10, repair global sem no_limit, 5 ranges in parallel, n1=2 shards, n2=8 shards, memory wanted =1 2.1) [asias@hjpc2 mycluster]$ time nodetool -p 7200 repair ks2 (repair on n2) [2022-11-23 17:49:49,301] Starting repair command #1, repairing 1 ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true) [2022-11-23 17:52:01,414] Repair session 1 [2022-11-23 17:52:01,415] Repair session 1 finished real 2m13.227s user 0m1.752s sys 0m0.218s 2.2) [asias@hjpc2 mycluster]$ time nodetool repair ks2 (repair on n1) [2022-11-23 17:52:19,280] Starting repair command #1, repairing 1 ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true) [2022-11-23 17:52:42,387] Repair session 1 [2022-11-23 17:52:42,387] Repair session 1 finished real 0m24.196s user 0m1.689s sys 0m0.184s Comparing 1.1) and 2.1), it shows the eviction played a major role here. The patch gives 73s / 30s = 2.5X speed up in this setup. Comparing 1.1 and 1.2, it shows even if we limit the readers, starting on the lower shard is faster 30s / 24s = 1.25X (the total number of multishard readers is lower) Fixes #12157 Closes #12158	2022-12-05 10:47:36 +02:00
Botond Dénes	1e20095547	Update tools/java submodule * tools/java 1c06006447...ecab7cf7d6 (1): > Add VSCode files to gitignore	2022-12-05 09:54:51 +02:00
Botond Dénes	c4d72c8dd0	Merge 'cql3: select_statement: split and coroutinize process_results()' from Avi Kivity Split the simple (and common) case from the complex case, and coroutinize the latter. Hopefully this generates better code for the simple case, and it makes the complex case a little nicer. Closes #12194 * github.com:scylladb/scylladb: cql3: select_statement: reindent process_results_complex() cql3: select_statement: coroutinize process_results_complex() cql3: select_statement: split process_results() into fast path and complex path	2022-12-05 08:16:22 +02:00
Avi Kivity	a0a4711b74	snapshot: protect list operations against the lambda coroutine fiasco run_snapshot_list_operation() takes a continuation, so passing it a lambda coroutine without protection is dangerous. Protect the coroutine with coroutine::lambda so it doesn't lost its contents. Fixes #12192. Closes #12193	2022-12-05 08:14:39 +02:00
guy9	cb842b2729	Replacing the Docs top bar message from the LIVE event to the community forum announcement Closes #12189	2022-12-05 08:05:04 +02:00
Avi Kivity	6326be5796	cql3: batch_statement: reindent get_mutations()	2022-12-04 21:47:22 +02:00
Avi Kivity	2d74360de3	cql3: batch_statement: coroutinize get_mutations() It has a do_with(), so an automatic win.	2022-12-04 21:45:10 +02:00
Avi Kivity	0834bb0365	cql3: select_statement: reindent process_results_complex()	2022-12-04 21:36:17 +02:00
Avi Kivity	a63f98e3fc	cql3: select_statement: coroutinize process_results_complex() Not a huge gain, since it's just a do_with, but still a little better. Note the inner lambda is not a coroutine, so isn't susceptibe to the lambda coroutine fiasco.	2022-12-04 21:34:51 +02:00
Avi Kivity	7f29efa0ad	cql3: select_statement: split process_results() into fast path and complex path This will allow us to coroutinize the complex path without adding an allocation to the fast path.	2022-12-04 21:30:45 +02:00
Avi Kivity	02b66bb31a	Merge 'Mark sstable::<directory accessing methods> private' from Pavel Emelyanov One of the prerequisites to make sstables reside on object-storage is not to let the rest of the code "know" the filesystem path they are located on (because sometimes they will not be on any filesystem path). This patch makes the methods that can reveal this path back private so that later they can be abstracted out. Closes #12182 * github.com:scylladb/scylladb: sstable: Mark some methods private test: Don't get sstable dir when known test: Use move_to_quarantine() helper test: Use sstable::filename() overload without dir name sstables: Reimplement batch directory sync after move table, tests: Make use of move_to_new_dir() default arg sstables: Remove fsync_directory() helper table: Simplify take_snapshot()'s collecting sstables names	2022-12-04 17:45:37 +02:00
Kamil Braun	b551cd254c	test: test_raft_upgrade: fix test_recover_stuck_raft_upgrade flakiness The test enables an error injection inside the Raft upgrade procedure on one of the nodes which will cause the node to throw an exception before entering `synchronize` state. Then it restarts other nodes with Raft enabled, waits until they enter `synchronize` state, puts them in RECOVERY mode, removes the error-injected node and creates a new Raft group 0. As soon as the other nodes enter `synchronize`, the test disabled the error injection (the rest of the test was outside the `async with inject_error(...)` block). There was a small chance that we disabled the error injection before the node reached it. In that case the node also entered `synchronize` and the cluster managed to finish the upgrade procedure. We encountered this during next promotion. Eliminate this possibility by extending the scope of the `async with inject_error(...)` block, so that the RECOVERY mode steps on the other nodes are performed within that block. Closes #12162	2022-12-02 21:26:44 +01:00
Avi Kivity	94f18b5580	test: sstable_conforms_to_mutation_source: use do_with_async() where needed The test clearly needs a thread (it converts a reader to a mutation without waiting), so give it one. Closes #12178	2022-12-02 20:48:37 +01:00
Pavel Emelyanov	084522d9eb	sstable: Mark some methods private There are several class sstable methods that reveal internal directory path to caller. It's not object-storage-friendly. Fortunately, all the callers of those methods had been patched not to work with full paths, so these can be marked private. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-02 21:15:02 +03:00
Pavel Emelyanov	fb63850f2c	test: Don't get sstable dir when known The sstable_move_test creates sstables in its own temp directories and the requests these dirs' paths back from sstables. Test can come with the paths it has at hand, no need to call sstables for it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-02 21:13:58 +03:00
Pavel Emelyanov	4c742a658d	test: Use move_to_quarantine() helper Two places in tests move sstable to quarantine subdir by hand. There's the class sstable method that does the same, so use it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-02 21:13:19 +03:00
Pavel Emelyanov	d6244b7408	test: Use sstable::filename() overload without dir name The dir this place currently uses is the directory where the sstable was created, so dropping this argument would just render the same path. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-02 21:12:21 +03:00
Pavel Emelyanov	a702affd4d	sstables: Reimplement batch directory sync after move There's a table::move_sstables_from_staging() method that gets a bunch of sstables and moves them from staging subdit into table's root datadir. Not to flush the root dir for every sstable move, it asks the sstable::move_to_new_dir() not to flush, but collects staging dir names and flushes them and the root dir at the end altothether. In order to make it more friendly to object-storage and to remove one more caller of sstable::get_dir() the delayed_commit_changes struct is introduced. It collects _all_ the affected dir names in unordered_set, then allows flushing them. By default the move_to_new_dir() doesn't receive this object and flushes the directories instantly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-02 21:08:47 +03:00
Pavel Emelyanov	1b42d5fce3	table, tests: Make use of move_to_new_dir() default arg The method in question accepts boolean bit whether or not it should sync directories at the end. It's always true but in one case, so there's the default value for it. Make use of it. Anticipating the suggestion to replace bool with bool_class -- next patch will replace it with something else. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-02 21:07:16 +03:00
Pavel Emelyanov	339feb4205	sstables: Remove fsync_directory() helper The one effectively wraps existing seastar sync_directory() helper into two io_check-s. It's simpler just to call the latter directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-02 21:05:43 +03:00
Pavel Emelyanov	80f5d7393f	table: Simplify take_snapshot()'s collecting sstables names The method in question "snapshots" all sstables it can find, then writes their Datafile names into the manifest file. To get the list of file names it iterates over sstables list again and does silly conversion of full file path to file name with the help of the directory path length. This all can be made much simpler if just collecting component names directly at the time sstable is hardlinked. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-02 21:02:37 +03:00
Raphael S. Carvalho	d61b4f9dfb	compaction_manager: Delete compaction_state's move constructor compaction_state shouldn't be moved once emplaced. moving it could theoretically cause task's gate holder to have a dangling pointer to compaction_state's gate, but turns out gate's move ctor will actually fail under this assertion: assert(!_count && "gate reassigned with outstanding requests"); Cannot happen today, but let's make it more future proof. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12167	2022-12-02 20:56:57 +03:00
Tomasz Grabiec	1a6bf2e9ca	Merge 'service/raft: specialized verb for failure detector pinger' from Kamil Braun We used GOSSIP_ECHO verb to perform failure detection. Now we use a special verb DIRECT_FD_PING introduced for this purpose. There are multiple reasons to do so. One minor reason: we want to use the same connection as other Raft verbs: if we can't deliver Raft append_entries or vote messages somewhere, that endpoint should be marked dead; if we can, the endpoint should be marked alive. So putting pings on the same connection as the other Raft verbs is important when dealing with weird situations where some connections are available but others are not. Observe that in `do_get_rpc_client_idx`, we put the new verb in the right place. Another minor reason: we remove the awkward gossiper `echo_pinger` abstraction which required storing and updating gossiper generation numbers. This also removes one dependency from Raft service code to gossiper. Major reason 1: the gossip echo handler has a weird mechanism where a replacing node returns errors during the replace operation to some of the nodes. In Raft however, we want to mark servers as alive when they are alive, including a server running on a node that's replacing another node. Major reason 2, related to the previous one: when server B is replacing server A with the same IP, the failure detector will try to ping both servers. Both servers are mapped to the same IP by the address map, so pings to both servers will reach server B. We want server B to respond to the pings destined for server B, but not to pings destined for server A, so the sender can mark B alive but keep A marked dead. To do this, we include the destination's Raft ID in our RPCs. The destination compares the received ID with its own. If it's different, it returns a `wrong_destination` response, and the failure detector knows that the ping did not reach the destination (it reached someone else). Yet another reason: removes "Not ready to respond gossip echo message" log spam during replace. Closes #12107 * github.com:scylladb/scylladb: service/raft: specialized verb for failure detector pinger db: system_keyspace: de-staticize `{get,set}_raft_server_id` service/raft: make this node's Raft ID available early in group registry	2022-12-02 13:54:02 +01:00
Pavel Emelyanov	71179ff5ab	distributed_loader: Use coroutine::lambda in sleeping coroutine According to seastar/doc/lambda-coroutine-fiasco.md lambda that co_awaits once loses its capture frame. In distrobuted_loader code there's at least one of that kind. fixes: #12175 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12170	2022-12-02 13:06:33 +02:00
Pavel Emelyanov	1d91914166	sstables: Drop set_generation() method The method became unused since `70e5252a` (table: no longer accept online loading of SSTable files in the main directory) and the whole concept of reshuffling sstables was dropped later by `7351db7c` (Reshape upload files and reshard+reshape at boot). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12165	2022-12-01 22:17:10 +02:00
Avi Kivity	2978052113	view: reindent maybe_mark_view_as_built Several identation levels were harmed during the preparation of this patch.	2022-12-01 22:09:21 +02:00
Avi Kivity	ac2e2f8883	view: coroutinize maybe_mark_view_as_built Somewhat simplifies complicated logic.	2022-12-01 22:04:51 +02:00
Kamil Braun	cbdcc944b5	service/raft: specialized verb for failure detector pinger We used GOSSIP_ECHO verb to perform failure detection. Now we use a special verb DIRECT_FD_PING introduced for this purpose. There are multiple reasons to do so. One minor reason: we want to use the same connection as other Raft verbs: if we can't deliver Raft append_entries or vote messages somewhere, that endpoint should be marked dead; if we can, the endpoint should be marked alive. So putting pings on the same connection as the other Raft verbs is important when dealing with weird situations where some connections are available but others are not. Observe that in `do_get_rpc_client_idx`, we put the new verb in the right place. Another minor reason: we remove the awkward gossiper `echo_pinger` abstraction which required storing and updating gossiper generation numbers. This also removes one dependency from Raft service code to gossiper. Major reason 1: the gossip echo handler has a weird mechanism where a replacing node returns errors during the replace operation to some of the nodes. In Raft however, we want to mark servers as alive when they are alive, including a server running on a node that's replacing another node. Major reason 2, related to the previous one: when server B is replacing server A with the same IP, the failure detector will try to ping both servers. Both servers are mapped to the same IP by the address map, so pings to both servers will reach server B. We want server B to respond to the pings destined for server B, but not to pings destined for server A, so the sender can mark B alive but keep A marked dead. To do this, we include the destination's Raft ID in our RPCs. The destination compares the received ID with its own. If it's different, it returns a `wrong_destination` response, and the failure detector knows that the ping did not reach the destination (it reached someone else). Yet another reason: removes "Not ready to respond gossip echo message" log spam during replace.	2022-12-01 20:54:18 +01:00
Kamil Braun	02c64becdc	db: system_keyspace: de-staticize `{get,set}_raft_server_id` Part of the anti-globals war.	2022-12-01 20:54:18 +01:00
Kamil Braun	99fe580068	service/raft: make this node's Raft ID available early in group registry Raft ID was loaded or created late in the boot procedure, in `storage_service::join_token_ring`. Create it earlier, as soon as it's possible (when `system_keyspace` is started), pass it to `raft_group_registry::start` and store it inside `raft_group_registry`. We will use this Raft ID stored in group registry in following patches. Also this reduces the number of disk accesses for this node's Raft ID. It's now loaded from disk once, stored in `raft_group_registry`, then obtained from there when needed. This moves `raft_group_registry::start` a bit later in the startup procedure - after `system_keyspace` is started - but it doesn't make a difference.	2022-12-01 20:54:18 +01:00
Nadav Har'El	6fcb5302a6	alternator-test: xfail a flaky test exposing a known bug In a recent commit `757d2a4`, we removed the "xfail" mark from the test test_manual_requests.py::test_too_large_request_content_length because it started to pass on more modern versions of Python, with a urllib3 bug fixed. Unfortunately, the celebration was premature: It turns out that although the test now usually passes, it sometimes fails. This is caused by a Seastar bug scylladb/seastar#1325, which I opened #12166 to track in this project. So unfortunately we need to add the "xfail" mark back to this test. Note that although the test will now be marked "xfail", it will actually pass most of the time, so will appear as "xpass" to people run it. I put a note in the xfail reason string as a reminder why this is happening. Fixes #12143 Refs #12166 Refs scylladb/seastar#1325 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12169	2022-12-01 20:00:46 +02:00
Kamil Braun	3cd035d1b9	test/pylib: scylla_cluster: remove `ScyllaCluster.decommissioned` field The field was not used for anything. We can keep decommissioned server in `stopped` field. In fact it caused us a problem: since recently, we're using `ScyllaCluster.uninstall` to clean-up servers after test suite finishes (previously we were using `ScyllaServer.uninstall` directly). But `ScyllaCluster.uninstall` didn't look into the `decommissioned` field, so if a server got decommissioned, we wouldn't uninstall it, and it left us some unnecessary artifacts even for successful tests. This is now fixed. Closes #12163	2022-12-01 19:07:26 +02:00
Avi Kivity	a4b77a5691	Merge 'Cleanup sstables::test_env's manager usage' from Pavel Emelyanov Mainly this PR removes global db::config and feature service that are used by sstables::test_env as dependencies for embedded sstables_manager. Other than that -- drop unused methods, remove nested test_env-s and relax few cases that use two temp dirs at a time for no gain. Closes #12155 * github.com:scylladb/scylladb: test, utils: Use only one tempdir sstable_compaction_test: Dont create nested envs mutation_reader_test: Remove unused create_sstable() helper tests, lib: Move globals onto sstables::test_env tests: Use sstables::test_env.db_config() to access config features: Mark feature_config_from_db_config const sstable_3_x_test: Use env method to create sst sstable_3_x_test: Indentation fix after previous patch sstable_3_x_test: Use sstable::test_env test: Add config to sstable::test_env creation config: Add constexpr value for default murmur ignore bits	2022-12-01 17:47:25 +02:00
Pavel Emelyanov	4c6bfc078d	code: Use http::re(quest\|ply) instead of httpd:: ones Recent seastar update deprecated those from httpd namespace. fixes: #12142 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12161	2022-12-01 17:33:35 +02:00
Pavel Emelyanov	adc6ee7ea8	test, utils: Use only one tempdir There's a do_with_cloned_tmp_directory that makes two temp dirs to toss sstables between them. Make it go with just one, all the more so it would resemble existing manipulations aroung staging/ subdir Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:39:57 +03:00
Pavel Emelyanov	15a7b9cafa	sstable_compaction_test: Dont create nested envs The "compact" test case runs in sstables::test_env and additionally wraps it with another instance provided by do_with_tmp_directory helper. It's simpler to create the temp dir by hand and use outter env. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:39:56 +03:00
Pavel Emelyanov	69fe5fd054	mutation_reader_test: Remove unused create_sstable() helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:39:54 +03:00
Pavel Emelyanov	400bc2c11d	tests, lib: Move globals onto sstables::test_env There's a bunch of objects that are used by test_env as sstables_manager dependencies. Now when no other code needs those globals they better sit on the test_env next to the manager Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:39:36 +03:00
Pavel Emelyanov	6a294b9ad6	tests: Use sstables::test_env.db_config() to access config Currently some places use global test config, but it's going to be removed soon, so switch to using config from environment Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:39:30 +03:00
Pavel Emelyanov	b4e31ad359	features: Mark feature_config_from_db_config const It's in fact such. Other than that, next patch will call it with const config at hand and fail to compile without this fix Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:39:27 +03:00
Pavel Emelyanov	8178845ef3	sstable_3_x_test: Use env method to create sst Just to make it shorter and conform to other sst env tests Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:39:19 +03:00
Pavel Emelyanov	8d5d05012e	sstable_3_x_test: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:39:09 +03:00
Pavel Emelyanov	6628d801f2	sstable_3_x_test: Use sstable::test_env There are several cases there that construct sstables_manager by hand with the help of a bunch of global dependencies. It's nicer to use existing wrapper. (indentation left broken until next patch) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:38:46 +03:00
Pavel Emelyanov	1d8c76164f	test: Add config to sstable::test_env creation To make callers (tests) construct it with different options. In particular, one test will soon want to construct it with custom large data handler of its own. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:38:18 +03:00
Pavel Emelyanov	6d0c8fb6e2	config: Add constexpr value for default murmur ignore bits ... and use in some places of sstable_compaction_test. This will allow getting rid of global test_db_config thing later Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-01 13:38:15 +03:00
Botond Dénes	dbd00fd3e9	Merge 'Task manager shard repair tasks' from Aleksandra Martyniuk The PR introduces shard_repair_task_impl which represents a repair task that spans over a single shard repair. repair_info is replaced with shard_repair_task_impl, since both serve similar purpose. Closes #12066 * github.com:scylladb/scylladb: repair: reindent repair: replace repair_info with shard_repair_task_impl repair: move repair_info methods to shard_repair_task_impl repair: rename methods of repair_module repair: change type of repair_module::_repairs repair: keep a reference to shard_repair_task_impl in row_level_repair repair: move repair_range method to shard_repair_task_impl repair: make do_repair_ranges a method of shard_repair_task_impl repair: copy repair_info methods to shard_repair_task_impl repair: corutinize shard task creation repair: define run for shard_repair_task_impl repair: add shard_repair_task_impl	2022-12-01 10:04:31 +02:00
Nadav Har'El	5eda8ce4fd	alternator ttl: in scanning thread, don't retry the same page too many times Since fixing issue #11737, when the expiration scanner times out reading a page of data, it retries asking for the same page instead of giving up on the scan and starting anew later. This retry was infinite - which can cause problems if we have a bug in the code or several nodes down, which can lead to getting hung in the same place in the scan for a very long (potentially infinite) time without making any progress. An example of such a bug was issue #12145, where we forgot to handle shutdowns, so on shutdown of the cluster we just hung forever repeating the same request that will never succeed. It's better in this case to just give up on the current scan, and start it anew (from a random position) later. Refs #12145 (that issue was already fixed, by a different patch which stops the iteration when shutting down - not waiting for an infinite number of iterations and not even one more). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-11-30 18:42:37 +02:00
Nadav Har'El	d08eef5a30	alternator: fix hang during shutdown of expiration-scanning thread The expiration-scanning thread is a long-running thread which can scan data for hours, but checks for its abort-source before fetching each page to allow for timely shutdown. Recently, we added the ability to retry the page fetching in case of timeout, for forgot to check the abort source in this new retry loop - which lead to an infinitely-long shutdown in some tests while the retry loop retries forever. In this patch we fix this bug by using sleep_abortable() instead of sleep(). sleep_abortable() will throw an exception if the abort source was triggered before or during the sleep - and this exception will stop the scan immediately. Fixes #12145 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-11-30 18:38:17 +02:00
Jan Ciolek	05ea0c1d60	dev/docs: add additional git pull to backport docs Botond noted that an additional git pull might be needed here: https://github.com/scylladb/scylladb/pull/12138#discussion_r1035857007 Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-30 16:14:02 +01:00
Jan Ciolek	e74873408b	docs/dev: add a note about cherry-picking individual commits Some people prefer to cherry-pick individual commits so that they have less conflicts to resolve at once. Add a comment about this possibility. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-30 16:06:39 +01:00
Kamil Braun	0f9d0dd86e	Merge 'raft: support IP address change' from Konstantin Osipov This is the core of dynamic IP address support in Raft, moving out the IP address sourcing from Raft Group 0 configuration to gossip. At start of Raft, the raft id <> IP address translation map is tuned into the gossiper notifications and learns IP addresses of Raft hosts from them. The series intentionally doesn't contain the part which speeds up the initial cluster assembly by persisting the translation cache and using more sources besides gossip (discovery, RPC) to show correctness of the approach. Closes #12035 * github.com:scylladb/scylladb: raft: (rpc) do not throw in case of a missing IP address in RPC raft: (address map) actively maintain ip <-> raft server id map	2022-11-30 15:40:18 +01:00
Aleksandra Martyniuk	78a6193c01	repair: reindent	2022-11-30 13:53:52 +01:00
Aleksandra Martyniuk	b4ad914fe1	repair: replace repair_info with shard_repair_task_impl repair_info is deleted and all its attributes are moved to shard_repair_task_impl.	2022-11-30 13:53:52 +01:00
Aleksandra Martyniuk	f6ec2cec92	repair: move repair_info methods to shard_repair_task_impl	2022-11-30 13:53:18 +01:00
Jan Ciolek	32663e6adb	docs/dev: use 'is merged into' instead of 'becomes' The backport instructions said that after passing the tests next `becomes` master, but it's more exact to say that next `is merged into` master. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-30 13:25:10 +01:00
Jan Ciolek	28cf8a18de	docs/dev: mention that new backport instructions are for the contributor Previously the section was called: "How to backport a patch", which could be interpreted as instructions for the maintainer. The new title clearly states that these instructions are for the contributor in case the maintainer couldn't backport the patch by themselves. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-30 13:23:15 +01:00
Takuya ASADA	4ecc08c4fe	docker: switch default locale to C.UTF-8 Since we switched scylla-machine-image locale to C.UTF-8 because ubuntu-minimal image does not have en_US.UTF-8 by default, we should do same on our docker image to reduce image size. Verified #9570 does not occur on new image, since it is still UTF-8 locale. Closes #12122	2022-11-30 13:58:43 +02:00
Anna Stuchlik	15cc3ecf64	doc: update the releases in the KB about updating the mode after upgrade	2022-11-30 12:53:13 +01:00
Anna Stuchlik	242a3916f0	doc: fix the broken link in the 5.1 upgrade guide	2022-11-30 12:49:20 +01:00
Alejo Sanchez	f7aa08ef25	test.py: don't stop cluster's site if not started The site member is created in ScyllaCluster.start(), for startup failure this might not be initialized, so check it's present before stop()ing it. And delete it as it's not running and proper initialization should call ScyllaCluster.start(). Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #11939	2022-11-30 13:47:18 +02:00
Anna Stuchlik	1575d96856	doc: add the link to the 5.1-related KB article to the 5.1 upgrade guide	2022-11-30 12:40:49 +01:00
Nadav Har'El	ce347f4b67	test/cql-pytest: add test for meaning of fetch_size with filtering A question was raised on what fetch_size (the requested page size in a paged scan) counts when there is a filter: does it count the rows before filtering (as scanned from disk) or after filter (as will be returned to the client)? This patch adds a test which demonstrates that Cassandra and Scylla behave differently in this respect: Cassandra counts post-filtering - so fetch_size results are actually returned, while Scylla currently counts pre-filtering. It is arguable which behavior is the "correct" one - we discuss this in issue #12102. But we have already had several users (such as #11340) who complained about Scylla's behavior and expected Cassandra's behavior, so if we decide to keep Scylla's behavior we should at least explain and justify this decision in our documentation. Until then, let's have this test which reminds us of this incompatibility. This test currently passes on Cassandra and fails (xfail) on Scylla. Refs #11340 Refs #12102 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12103	2022-11-30 12:27:06 +02:00
Nadav Har'El	8bd8ef3d03	test/cql-pytest: add regression test for old issue This patch adds a regression test for the old issue #65 which is about a multi-column (tuple) clustering-column relation in a SELECT when one these columns has reversed order. It turns out that we didn't notice, but this issue was already solved - but we didn't have a regression test for it. So this patch adds just a regression test. The test confirms that Scylla now behaves like was desired when that issue was opened. The test also passes on Cassandra, confirming that Scylla and Cassandra behave the same for such requests. Fixes #65 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12130	2022-11-30 12:22:21 +02:00
Michał Jadwiszczak	8e64e18b80	forward_service: add debug logs Adds a few debug logs to see what is happening in https://github.com/scylladb/scylladb/issues/11684 Wrapped `forward_result::printer` into `seastar::value_of` to lazy evaluate the printer Closes #12113	2022-11-30 12:15:26 +02:00
Yaniv Kaul	b66ca3407a	doc: Typo - then -> than Fix a typo. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes #12140	2022-11-30 12:03:56 +02:00
Botond Dénes	50aea9884b	Merge 'Improve the Raft upgrade procedure' from Kamil Braun Better logging, less code, a minor fix. Closes #12135 * github.com:scylladb/scylladb: service/raft: raft_group0: less repetitive logging calls service/raft: raft_group0: fix sleep_with_exponential_backoff	2022-11-30 11:24:20 +02:00
Avi Kivity	6a5d9ff261	treewide: use non-experimental std::source_location Now that we use libstdc++ 12, we can use the standardized source_location. Closes #12137	2022-11-30 11:06:43 +02:00
Jan Ciolek	56a802c979	docs/dev: Add backport instructions for contributors Add instructions on how to backport a feature to on older version of Scylla. It contains a detailed step-by-step instruction so that people unfamiliar with intricacies of Scylla's repository organization can easily get the hang of it. This is the guide I wish I had when I had to do my first backport. I put it in backport.md because that looks like the file responsible for this sort of information. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-29 22:10:27 +01:00
Konstantin Osipov	fbe7886cc0	raft: (rpc) do not throw in case of a missing IP address in RPC Remove raft_address_map::get_inet_address() While at it, coroutinize some rpc mehtods. To propagate up the event of missing IP address, use coroutine::exception( with a proper type (raft::transport_error) and a proper error message. This is a building block from removing raft_address_map::get_inet_address() which is too generic, and shifting the responsibility of handling missing addresses to the address map clients. E.g. one-way RPC shouldn't throw if an address is missing, but just drop the message. PS An attempt to use a single template function rendered to be too complex: - some functions require a gate, some don't - some return void, some future<> and some future<raft::data_type>	2022-11-29 19:55:48 +03:00
Konstantin Osipov	73e5298273	raft: (address map) actively maintain ip <-> raft server id map 1) make address map API flexible Before this patch: - having a mapping without an actual IP address was an internal error - not having a mapping for an IP address was an internal error - re-mapping to a new IP address wasn't allowed After this patch: - the address map may contain a mapping without an actual IP address, and the caller must be prepared for it: find() will return a nullopt. This happens when we first add an entry to Raft configuration and only later learn its IP address, e.g. via gossip. - it is allowed to re-map an existing entry to a new address; 2) subscribe to gossip notifications Learning IP addresses from gossip allows us to adjust the address map whenever a node IP address changes. Gossiper is also the only valid source of re-mapping, other sources (RPC) should not re-map, since otherwise a packet from a removed server can remap the id to a wrong address and impact liveness of a Raft cluster. 3) prompt address map state with app state Initialize the raft address map with initial gossip application state, specifically IPs of members of the cluster. With this, we no longer need to store these IPs in Raft configuration (and update them when they change). The obvious drawback of this approach is that a node may join Raft config before it propagates its IP address to the cluster via gossip - so the boot process has to wait until it happens. Gossip also doesn't tell us which IPs are members of Raft configuration, so we subscribe to Group0 configuration changes to mark the members of Raft config "non-expiring" in the address translation map. Thanks to the changes above, Raft configuration no longer stores IP addresses. We still keep the 'server_info' column in the raft_config system table, in case we change our mind or decide to store something else in there.	2022-11-29 19:55:43 +03:00
Kamil Braun	3dbcff435f	service/raft: raft_group0: less repetitive logging calls Some log messages in retry loops in the Raft upgrade procedure included a sentence like "sleeping before retrying..."; but not all of them. With the recently added `sleep_with_exponential_backoff` abstraction we can put this "sleeping..." message in a single place, and it's also easy to say how long we're going to sleep. I also enjoy using this `source_location` thing.	2022-11-29 17:42:43 +01:00
Nadav Har'El	c5121cf273	cql: fix column-name aliases in SELECT JSON The SELECT JSON statement, just like SELECT, allows the user to rename selected columns using an "AS" specification. E.g., "SELECT JSON v AS foo". This specification was not honored: We simply forgot to look at the alias in SELECT JSON's implementation (we did it correctly in regular SELECT). So this patch fixes this bug. We had two tests in cassandra_tests/validation/entities/json_test.py that reproduced this bug. The checks in those tests now pass, but these two tests still continue to fail after this patch because of two other unrelated bugs that were discovered by the same tests. So in this patch I also add a new test just for this specific issue - to serve as a regression test. Fixes #8078 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12123	2022-11-29 18:16:19 +02:00
Avi Kivity	faf11587fa	Update seastar submodule * seastar 4f4cc00660...3a5db04197 (16): > tls: add missing include <map> > Merge 'util/process: use then_unpack to help automatically unpack tuple.' from Jianyong Chen > HTTP: define formatter for status_type to fix build. > fsnotifier: move it into namespace experimental and add docs. > Move fsnotify.hh to the 'include' directory for public use. > Merge 'reactor: define make_pipe() and use make_pipe() in reactor::spawn()' from Kefu Chai > Merge 'Fix: error when compiling http_client_demo' from Amossss > util/process: using `data_sink_impl::put` > Merge 'dns: serialize UDP sends.' from Calle Wilund > build: use correct version when finding liburing > Merge 'Add simple http client' from Pavel Emelyanov > future: use invoke_result instead of nested requirements > Merge 'reactor: use separate calls in reactor and reactor_backend for read/write/sendmsg/recvmsg' from Kefu Chai > util, core: add spawn_process() helper > parallel utils: add note about shard-local parallelism > shared_mutex: return typed exceptional future in with_* error handlers Closes #12131	2022-11-29 18:10:06 +02:00
Kamil Braun	580bdec875	service/raft: raft_group0: fix sleep_with_exponential_backoff It was immediately jumping to _max_retry_period.	2022-11-29 16:27:59 +01:00
Nadav Har'El	6bc3075bbd	test/alternator: increase timeout on TTL tests Some of the tests in test/alternator/test_ttl.py need an expiration scan pass to complete and expire items. In development builds on developer machines, this usually takes less than a second (our scanning period is set to half a second). However, in debug builds on Jenkins each scan often takes up to 100 (!) seconds (this is the record we've seen so far). This is why we set the tests' timeout to 120. But recently we saw another test run failing. I think the problem is that in some case, we need not one, but two scanning passes to complete before the timeout: It is possible that the test writes an item right after the current scan passed it, so it doesn't get expired, and then we a second scan at a random position, possibly making that item we mention one of the last items to be considered - so in total we need to wait for two scanning periods, not one, for the item to expire. So this patch increases the timeout from 120 seconds to 240 seconds - more than twice the highest scanning time we ever saw (100 seconds). Note that this timeout is just a timeout, it's not the typical test run time: The test can finish much more quickly, as little as one second, if items expire quickly on a fast build and machine. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12106	2022-11-29 16:37:54 +03:00
Nadav Har'El	1f8adda4b2	Merge 'treewide: improve compatibility with gcc 12' from Avi Kivity Fix some issues found with gcc 12. Note we can't fully compile with gcc yet, due to [1]. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98056 Closes #12121 * github.com:scylladb/scylladb: utils: observer: qualify seastar::noncopyable_function sstables: generation_type: forgo constexpr on hash of generation_type logalloc: disambiguate types and non-type members task_manager: disambiguate types and non-type members direct_failure_detector: don't change meaning of endpoint_liveness schema: abort on illegal per column computation kind database: abort on illegal per partition rate limit operation mutation_fragment: abort on illegal fragment type per_partition_rate_limit_options: abort on illegal operation type schema: drop unused lambda mutation_partition: drop unused lambda cql3: create_index_statement: remove unused lambda transport: prevent signed and unsigned comparison database: don't compare signed and unsigned types raft: don't compare signed and unsigned types compaction: don't compare signed and unsigned compaction counts bytes_ostream: don't take reference to packed variable	2022-11-29 13:57:24 +02:00
Avi Kivity	ea99750de7	test: give tests less-unique identifiers Test identifiers are very unique, but this makes them less useful in Jenkins Test Result Analyzer view. For example, counter_test can be counter_test.432 in one run and counter_test.442 in another. Jenkins considers them different and so we don't see a trend. Limit the id uniqueness within a test case, so that we'll have counter_test.{1, 2, 3} consistently. Those test will be grouped together so we can see pass/fail trends. Closes #11946	2022-11-29 13:14:14 +02:00
Yaniv Kaul	fef8e43163	doc: cluster management: Replace a misplaced period with a a bulleted list of items Signed-Off-By: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes #12125	2022-11-29 12:42:24 +02:00
Botond Dénes	e9fec761a2	Merge 'doc: document the procedure for updating the mode after upgrade' from Anna Stuchlik Fix https://github.com/scylladb/scylla-docs/issues/4126 Closes #11122 * github.com:scylladb/scylladb: doc: add info about the time-consuming step due to resharding doc: add the new KB to the toctree doc: doc: add a KB about updating the mode in perftune.yaml after upgrade	2022-11-29 12:41:46 +02:00
Avi Kivity	ea901fdb9d	cql3: expr: fold `null` into untyped_constant/constant Our `null` expression, after the prepare stage, is redundant with a `constant` expression containing the value NULL. Remove it. Its role in the unprepared stage is taken over by untyped_constant, which gains a new type_class enumeration to represent it. Some subtleties: - Usually, handling of null and untyped_constant, or null and constant was the same, so they are just folded into each other - LWT "like" operator now has to discriminate between a literal string and a literal NULL - prepare and test_assignment were folded into the corresponing untyped_constant functions. Some care had to be taken to preserve error messages. Closes #12118	2022-11-29 11:02:18 +02:00
Aleksandra Martyniuk	8bc0af9e34	repair: fix double start of data sync repair task Currently, each data sync repair task is started (and hence run) twice. Thus, when two running operations happen within a time frame long enough, the following situation may occur: - the first run finishes - after some time (ttl) the task is unregistered from the task manager - the second run finishes and attempts to finish the task which does not exist anymore - memory access causes a segfault. The second call to start is deleted. A check is added to the start method to ensure that each task is started at most once. Fixes: #12089 Closes #12090	2022-11-29 00:00:10 +02:00
Avi Kivity	9765b2e3bc	cql3: expr: drop remnants of `bool` component from expression In `ad3d2ee47d`, we replaced `bool` as an expression element (representing a boolean constant) with `constant`. But a comment and a concept continue to mention it. Remove the comment and the concept fragment. Closes #12119	2022-11-28 23:18:26 +02:00
Pavel Emelyanov	ae79669fd2	topology: Be less restrictive about missing endpoints Recent changes in topology restricted the get_dc/get_rack calls. Older code was trying to locate the endpoint in gossiper, then in system keyspace cache and if the endpoint was not found in both -- returned "default" location. New code generates internal error in this case. This approach already helped to spot several BUGs in code that had been eventually fixed, but echoes of that change still pop up. This patch relaxes the "missing endpoint" case by printing a warning in logs and returning back the "default" location like old code did. tests: update_cluster_layout_tests.py::* hintedhandoff_additional_test.py::TestHintedHandoff::test_hintedhandoff_rebalance bootstrap_test.py::TestBootstrap::test_decommissioned_wiped_node_can_join bootstrap_test.py::TestBootstrap::test_failed_bootstap_wiped_node_can_join materialized_views_test.py::TestMaterializedViews::test_decommission_node_during_mv_insert_4_nodes refs: #11900 refs: #12054 fixes: #11870 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12067	2022-11-28 22:01:09 +02:00
Avi Kivity	3a6eafa8c6	utils: observer: qualify seastar::noncopyable_function gcc checks name resolution eagerly, and can't find noncopyable_function as this header doesn't include "seastarx.hh". Qualify the name so it finds it.	2022-11-28 21:58:30 +02:00
Avi Kivity	5ae98ab3de	sstables: generation_type: forgo constexpr on hash of generation_type std::hash isn't constexpr, so gcc refuses to make hash of generation_type constexpr. It's pointless anyway since we never have a compile-time sstable generation.	2022-11-28 21:58:30 +02:00
Avi Kivity	a2d43bb851	logalloc: disambiguate types and non-type members logalloc::tracker has some members with the same names as types from namespace scope. gcc (rightfully) complains that this changes the meaning of the name. Qualify the types to disambiguate.	2022-11-28 21:58:30 +02:00
Avi Kivity	ed5da87930	task_manager: disambiguate types and non-type members task_manager has some members with the same names as types from namespace scope. gcc (rightfully) complains that this changes the meaning of the name. Qualify the types to disambiguate.	2022-11-28 21:58:30 +02:00
Avi Kivity	27be1670d1	direct_failure_detector: don't change meaning of endpoint_liveness It's used both as a type and as a member. Qualify the type so they have different names.	2022-11-28 21:58:30 +02:00
Avi Kivity	735c46cb63	schema: abort on illegal per column computation kind Without memory corruption it's not possible for the switch to fall through, and the compiler will error if we forget to add a case. The compiler however is obliged to consider that we might store some other value in the variable.	2022-11-28 21:58:30 +02:00
Avi Kivity	f73a51250c	database: abort on illegal per partition rate limit operation Without memory corruption it's not possible for the switch to fall through, and the compiler will error if we forget to add a case. The compiler however is obliged to consider that we might store some other value in the variable.	2022-11-28 21:58:30 +02:00
Avi Kivity	f469885b41	mutation_fragment: abort on illegal fragment type Without memory corruption it's not possible for the switch to fall through, and the compiler will error if we forget to add a case. The compiler however is obliged to consider that we might store some other value in the variable.	2022-11-28 21:58:30 +02:00
Avi Kivity	a3c89cedbd	per_partition_rate_limit_options: abort on illegal operation type Without memory corruption it's not possible for the switch to fall through, and the compiler will error if we forget to add a case. The compiler however is obliged to consider that we might store some other value in the variable.	2022-11-28 21:58:30 +02:00
Avi Kivity	7ec28a81bf	schema: drop unused lambda get_cell is defined but not used.	2022-11-28 21:58:30 +02:00
Avi Kivity	c493a2379a	mutation_partition: drop unused lambda should_purge_row_tombstone is defined but not used.	2022-11-28 21:58:30 +02:00
Avi Kivity	e25bf62871	cql3: create_index_statement: remove unused lambda throw_exception is defined but not used.	2022-11-28 21:58:30 +02:00
Avi Kivity	5dedf85288	transport: prevent signed and unsigned comparison This can lead to undefined behavior. Cast to unsigned, after we've verified the value is indeed positive.	2022-11-28 21:58:30 +02:00
Avi Kivity	77be69b600	database: don't compare signed and unsigned types gcc warns it can lead to undefined behavior, though 2G entries in a list of mutations are unlikely. Use the correct type for iteration.	2022-11-28 21:58:30 +02:00
Avi Kivity	fb6804e7a4	raft: don't compare signed and unsigned types gcc warns it can lead to undefined behavior, though 2G entries in a list of mutations are unlikely. Use the correct type for iteration.	2022-11-28 21:58:30 +02:00
Avi Kivity	f565db75ce	compaction: don't compare signed and unsigned compaction counts gcc warns as this can lead to incorrect results. Cast the threshold to an unsigned type (we know it's positive at this point) to avoid the warning.	2022-11-28 21:41:56 +02:00
Avi Kivity	23b94ac391	bytes_ostream: don't take reference to packed variable bytes_ostream is packed, so its _begin member is packed as well. gcc (correctly) disallows taking a reference to an unaligned variable in an aligned refernce, and complains. Make it happy by open-coding the exchange operation.	2022-11-28 21:40:18 +02:00
Nadav Har'El	5480211061	Merge 'test.py: support node replace operation' from Kamil Braun The `add_server` function now takes an optional `ReplaceConfig` struct (implemented using `NamedTuple`), which specifies the ID of the replaced server and whether to reuse the IP address. If we want to reuse the IP address, we don't allocate one using the host registry. This required certain refactors: moving the code responsible for allocation of IPs outside `ScyllaServer`, into `ScyllaCluster`. Add two tests, but they are now skipped: one of them is failing (unability for new node to join group 0) and both suffer from a hardcoded 60-second sleep in Scylla. Closes #12032 * github.com:scylladb/scylladb: test/topology: simple node replace tests (currently disabled) test/pylib: scylla_cluster: support node replace operation test/pylib: scylla_cluster: move members initialization to constructor test/pylib: scylla_cluster: (re)lease IP addr outside ScyllaServer test/pylib: scylla_cluster: refactor create_server parameters to a struct test.py: stop/uninstall clusters instead of servers when cleaning up test/pylib: artifact_registry: replace `Awaitable` type with `Coroutine` test.py: prepare for adding extra config from test when creating servers test/pylib: manager_client: convert `add_server` to use `put_json` test/pylib: rest_client: allow returning JSON data from `put_json` test/pylib: scylla_cluster: don't import from manager_client	2022-11-28 16:06:39 +02:00
Takuya ASADA	4d8fb569a1	install.sh: drop locale workaround from python3 thunk Since #7408 does not occur on current python3 version (3.11.0), let's drop the workarond. Closes #12097	2022-11-28 13:07:03 +02:00
Anna Stuchlik	452915cef6	doc: set the documentation version 5.1 as default (latest) Closes #12105	2022-11-28 12:02:13 +01:00
Avi Kivity	380da0586c	Update tools/python3 submodule (drop locale workaround) * tools/python3 773070e...548e860 (1): > install.sh: drop locale workaround from python3 thunk	2022-11-28 12:24:13 +02:00
Avi Kivity	0da66371a5	storage_proxy: coroutinize inner continuation of create_hint_sync_point() It is part of a coroutine::parallel_for_each(), which is safe for lambda coroutines. Closes #12057	2022-11-28 11:30:00 +02:00
Avi Kivity	d12d42d1a6	Revert "configure: temporarily disable wasm support for aarch64" This reverts commit `e2fe8559ca`. I ran all the release mode tests on aarch64 with it reverted, and it passes. So it looks like whatever problems we had with it were fixed. Closes #12072	2022-11-28 11:30:00 +02:00
Nadav Har'El	99a72a9676	Merge 'cql3: expr: make it possible to evaluate expr::binary_operator' from Jan Ciołek As a part of CQL rewrite we want to be able to perform filtering by calling `evaluate()` on an expression and checking if it evaluates to `true`. Currently trying to do that for a binary operator would result in an error. Right now checking if a binary operation like `col1 = 123` is true is done using `is_satisfied_by`, which is able to check if a binary operation evaluates to true for a small set of predefined cases. Eventually once the grammar is relaxed we will be able to write expressions like: `(col1 < col2) = (1 > ?)`, which doesn't fit with what `is_satisfied_by` is supposed to do. Additionally expressions like `1 = NULL` should evaluate to `NULL`, not `true` or `false`. `is_satsified_by` is not able to express that properly. The proper way to go is implementing `evaluate(binary_operator)`, which takes a binary operation and returns what the result of it would be. Implementing `prepare_expression` for `binary_operator` requires us to be able to evaluate it first. In the next PR I will add support for `prepare_expression`. Closes #12052 * github.com:scylladb/scylladb: cql-pytest: enable two unset value tests that pass now cql-pytest: reduce unset value error message cql3: expr: change unset value error messages to lowercase cql_pytest: ensure that where clauses like token(p) = 0 AND p = 0 are rejected cql3: expr: remove needless braces around switch cases cql3: move evaluation IS_NOT NULL to a separate function expr_test: test evaluating LIKE binary_operator expr_test: test evaluating IS_NOT binary_operator expr_test: test evaluating CONTAINS_KEY binary_operator expr_test: test evaluating CONTAINS binary_operator expr_test: test evaluating IN binary_operator expr_test: test evaluating GTE binary_operator expr_test: test evaluating GT binary_operator expr_test: test evaluating LTE binary_operator expr_test: test evaluating LT binary_operator expr_test: test evaluating NEQ binary_operator expr_test: test evaluating EQ binary_operator cql3: expr properly handle null in is_one_of() cql3: expr properly handle null in like() cql3: expr properly handle null in contains_key() cql3: expr properly handle null in contains() cql3: expr: properly handle null in limits() cql3: expr: remove unneeded overload of limits() cql3: expr: properly handle null in equality operators cql3: expr: remove unneeded overload of equal() cql3: expr: use evaluate(binary_operator) in is_satisfied_by cql3: expr: handle IS NOT NULL when evaluating binary_operator cql3: expr: make it possible to evaluate binary_operator cql3: expr: accept expression as lhs argument to like() cql3: expr: accept expression as lhs in contains_key cql3: expr: accept expression as lhs argument to contains()	2022-11-28 11:30:00 +02:00
Nadav Har'El	1e59c3f9ef	alternator: if TTL scan times out, continue immediately The Alternator TTL expiration scanner scans an entire table using many small pages. If any of those pages time out for some reason (e.g., an overload situation), we currently consider the entire scan to have failed and wait for the next scan period (which by default is 24 hours) when we start the scan from scratch (at a random position). There is a risk that if these timeouts are common enough to occur once or more per scan, the result is that we double or more the effective expiration lag. A better solution, done in this patch, is to retry from the same position if a single page timed out - immediately (or almost immediately, we add a one-second sleep). Fixes #11737 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12092	2022-11-28 11:30:00 +02:00
Avi Kivity	45a57bf22d	Update tools/java submodule (revert scylla-driver) scylla-driver causes dtests to fail randomly (likely due to incorrect handling of the USE statement). Revert it. * tools/java 73422ee114...1c06006447 (2): > Revert "Add Scylla Cloud serverless support" > Revert "Switch cqlsh to use scylla-driver"	2022-11-28 11:29:08 +02:00
Benny Halevy	8f584a9a80	storage_service: handle_state_normal: always update_topology before update_normal_tokens update_normal_tokens checks that that the endpoint is in topology. Currently we call update_topology on this path only if it's not a normal_token_owner, but there are paths when the endpoint could be a normal token owner but still be pending in topology so always update it, just in case. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-28 11:25:36 +02:00
Benny Halevy	6b13fd108a	storage_service: handle_state_normal: delete outdated comment regarding update pending ranges race asias@scylladb.com said: > This comments was moved up to the wrong place when tmptr->update_topology was added. > There is no race now since we use the copy-update-replace method to update token_metadada. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-28 11:25:36 +02:00
Kefu Chai	af011aaba1	utils/variant_element: simplify is_variant_element with right fold for better readability than the recursive approach. Signed-off-by: Kefu Chai <tchaikov@gmail.com> Closes #12091	2022-11-27 16:34:34 +02:00
Avi Kivity	78222ea171	Update tools/java submodule (cqlsh system_distributed_everywhere is a system keyspace) * tools/java 874e2d529b...73422ee114 (1): > Mark "system_distributed_everywhere" as system ks	2022-11-27 15:37:57 +02:00
Aleksandra Martyniuk	9a3d114349	tasks: move methods from task_manager to source file Methods from tasks::task_manager and nested classes are moved to source file. Closes #12064	2022-11-27 15:09:28 +02:00
Piotr Dulikowski	22fbf2567c	utils/abi: don't use the deprecated std::unexpected_handler Recently, clang started complaining about std::unexpected_handler being deprecated: ``` In file included from utils/exceptions.cc:18: ./utils/abi/eh_ia64.hh:26:10: warning: 'unexpected_handler' is deprecated [-Wdeprecated-declarations] std::unexpected_handler unexpectedHandler; ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/exception:84:18: note: 'unexpected_handler' has been explicitly marked deprecated here typedef void (*_GLIBCXX11_DEPRECATED unexpected_handler) (); ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/x86_64-redhat-linux/bits/c++config.h:2343:32: note: expanded from macro '_GLIBCXX11_DEPRECATED' ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/x86_64-redhat-linux/bits/c++config.h:2334:46: note: expanded from macro '_GLIBCXX_DEPRECATED' ^ 1 warning generated. ``` According to cppreference.com, it was deprecated in C++11 and removed in C++17 (!). This commit gets rid of the warning by inlining the std::unexpected_handler typedef, which is defined as a pointer a function with 0 arguments, returning void. Fixes: #12022 Closes #12074	2022-11-27 12:25:20 +02:00
Alejo Sanchez	5ff4b8b5f8	pytest: catch rare exception for random tables test On rare occassions a SELECT on a DROPpped table throws cassandra.ReadFailure instead of cassandra.InvalidRequest. This could not be reproduced locally. Catch both exceptions as the table is not present anyway and it's correctly marked as a failure. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #12027	2022-11-27 10:26:55 +02:00
Michał Chojnowski	a75e4e1b23	db: config: disable global index page caching by default Global index page caching, as introduced in 4.6 (`078a6e422b` and `9f957f1cf9`) has proven to be misdesigned, because it poses a risk of catastrophic performance regressions in common workloads by flooding the cache with useless index entries. Because of that risk, it should be disabled by default. Refs #11202 Fixes #11889 Closes #11890	2022-11-26 14:27:26 +02:00
Aleksandra Martyniuk	c2ea3f49e6	repair: rename methods of repair_module Methods of repair_module connected with repair_module::_repairs are renamed to match repair_module::_repairs type.	2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk	13dbd75ba8	repair: change type of repair_module::_repairs As a preparation to replacing repair_info with shard_repair_task_impl, type of _repairs in repair module is changed from std::unordered_map<int, lw_shared_ptr<repair_info>> to std::unordered_map<int, tasks::task_id>.	2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk	55c01a1beb	repair: keep a reference to shard_repair_task_impl in row_level_repair As a part of replacing repair_info with shard_repair_task_impl, instead of a reference to repair_info, row_level_repair keeps a reference to shard_repair_task_impl.	2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk	9b664570f0	repair: move repair_range method to shard_repair_task_impl	2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk	3ac5ba7b28	repair: make do_repair_ranges a method of shard_repair_task_impl Function do_repair_ranges is directly connected to shard repair tasks. Turning it into shard_repair_task_impl method enables an access to tasks' members with no additional intermediate layers.	2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk	a09dfcdacd	repair: copy repair_info methods to shard_repair_task_impl Methods of repair_info are copied to shard_repair_task_impl. They are not used yet, it's a preparation for replacing repair_info with shard_repair_task_impl.	2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk	a4b1bdb56c	repair: corutinize shard task creation	2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk	996c0f3476	repair: define run for shard_repair_task_impl Operations performed as a part of shard repair are moved to shard_repair_task_impl run method.	2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk	ba9770ea02	repair: add shard_repair_task_impl Create a task spanning over a repair performed on a given shard.	2022-11-25 16:40:49 +01:00
Anna Stuchlik	d5f676106e	doc: remove the LWT page from the index of Enterprise features Closes #12076	2022-11-24 21:59:05 +02:00
Aleksandra Martyniuk	dcc17037c7	repair: fix bad cast in tasks::task_id parsing In system_keyspace::get_repair_history value of repair_uuid is got from row as tasks::task_id. tasks::task_id is represented by an abstract_type specific for utils::UUID. Thus, since their typeids differ, bad_cast is thrown. repair_uuid is got from row as utils::UUID and then cast. Since no longer needed, data_type_for<tasks::task_id> is deleted. Fixes: #11966 Closes #12062	2022-11-24 19:37:44 +02:00
Jan Ciolek	77c7d8b8f6	cql-pytest: enable two unset value tests that pass now While implementing evaluate(binary_operator) missing checks for unset value were added for comparisons in filtering code. Because of that some tests for unset value started passing. There are still other tests for unset value that are failing because Scylla doesn't have all the checks that it should. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-24 17:07:17 +01:00
Jan Ciolek	5bc0bc6531	cql-pytest: reduce unset value error message When unset value appears in an invalid place both Cassandra and Scylla throw an error. The tests were written with Cassandra and thus the expected error messages were exactly the same as produced by Cassandra. Scylla produces different error messages, but both databases return messages with the text 'unset value'. Reduce the expected message text from the whole message to something that contains 'unset value'. It would be hard to mimic Cassandra's error messages in Scylla. There is no point in spending time on that. Instead it's better to modify the tests so that they are able to work with both Cassandra and Scylla. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-24 17:04:07 +01:00
Jan Ciolek	08f40a116d	cql3: expr: change unset value error messages to lowercase The messages used to contain UNSET_VALUE in capital letters, but the tests expect messages with 'unset value'. Change the message so that it can match the expected error text in tests. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-24 17:02:44 +01:00
Kamil Braun	fda6403b29	test/topology: simple node replace tests (currently disabled) Add two node replace tests using the freshly added infrastructure. One test replaces a node while using a different IP. It is disabled because the replace operation has an unconditional 60-seconds sleep (it doesn't depend on the ring_delay setting for some reason). The sleep needs to be fixed before we can enable this test. The other test replaces while reusing the replaced node's IP. Additionally to the sleep, the test fails because the node cannot join group 0; it's stuck in an infinite loop of trying to join: ``` INFO 2022-11-18 15:56:19,933 [shard 0] raft_group0 - server 8de951fd-a528-4a82-ac54-592ea269537f found no local group 0. Discovering... INFO 2022-11-18 15:56:19,933 [shard 0] raft_group0 - server 8de951fd-a528-4a82-ac54-592ea269537f found group 0 with group id 25d2b050-6751-11ed-b534-c3c40c275dd3, leader b7047f7e-03e6-4797-a723-24054201f91d INFO 2022-11-18 15:56:19,934 [shard 0] raft_group0 - Server 8de951fd-a528-4a82-ac54-592ea269537f is starting group 0 with id 25d2b050-6751-11ed-b534-c3c40c275dd3 WARN 2022-11-18 15:56:20,935 [shard 0] raft_group0 - failed to modify config at peer b7047f7e-03e6-4797-a723-24054201f91d: seastar::rpc::timeout_error (rpc call timed out). Retrying. INFO 2022-11-18 15:56:21,937 [shard 0] raft_group0 - server 8de951fd-a528-4a82-ac54-592ea269537f found group 0 with group id 25d2b050-6751-11ed-b534-c3c40c275dd3, leader ee0175ea-6159-4d4c-9d7c-95c934f8a408 WARN 2022-11-18 15:56:22,937 [shard 0] raft_group0 - failed to modify config at peer ee0175ea-6159-4d4c-9d7c-95c934f8a408: seastar::rpc::timeout_error (rpc call timed out). Retrying. INFO 2022-11-18 15:56:23,938 [shard 0] raft_group0 - server 8de951fd-a528-4a82-ac54-592ea269537f found group 0 with group id 25d2b050-6751-11ed-b534-c3c40c275dd3, leader ee0175ea-6159-4d4c-9d7c-95c934f8a408 WARN 2022-11-18 15:56:24,939 [shard 0] raft_group0 - failed to modify config at peer ee0175ea-6159-4d4c-9d7c-95c934f8a408: seastar::rpc::timeout_error (rpc call timed out). Retrying. ``` and so on.	2022-11-24 16:26:23 +01:00
Kamil Braun	2f60550ff3	test/pylib: scylla_cluster: support node replace operation The `add_server` function now takes an optional `ReplaceConfig` struct (implemented using `NamedTuple`), which specifies the ID of the replaced server and whether to reuse the IP address. If we want to reuse the IP address, we don't allocate one using the host registry. Since now multiple servers can have the same IP, introduce a `leased_ips` set to `ScyllaCluster` which is used when `uninstall`ing the cluster - to make sure we don't `release_host` the same host twice.	2022-11-24 16:26:23 +01:00
Kamil Braun	d80247f912	test/pylib: scylla_cluster: move members initialization to constructor Previously some members had to be initialized in `install` because that's when we first knew the IP address. Now we know the IP address during construction, which allows us to make the code a bit shorter and simpler, and establish invariants: some members (such as `self.config`) are now valid for the entire lifetime of the server object. `install()` is reduced to performing only side effects (creating directories, writing config files), all calculation is done inside the constructor.	2022-11-24 16:26:23 +01:00
Kamil Braun	3934eefd20	test/pylib: scylla_cluster: (re)lease IP addr outside ScyllaServer `ScyllaServer`s were constructed without IP addresses. They leased an IP address from `HostRegistry` and released them in `uninstall`. This responsibility was now moved into `ScyllaCluster`, which leases an IP address for a server before constructing it, and passes it to the constructor. It releases the addresses of its serverswhen uninstalling itself. This will allow the cluster to reuse the IP address of an existing server in that cluster when adding a new server which wants to replace the existing one. Instead of leasing a new address, it will pass the existing IP address to the new server's constructor. The refactor is also nice in that it establishes an invariant for `ScyllaServer`, simplifying reasoning about the class: now it has an `ip_addr` field at all times. `host_registry` was moved from `ScyllaServer` to `ScyllaCluster`.	2022-11-24 16:26:23 +01:00
Kamil Braun	9d5e1191da	test/pylib: scylla_cluster: refactor create_server parameters to a struct `ScyllaCluster` constructor takes a function `create_server` which itself takes 3 parameters now. Soon it will take a 4th. The list of parameters is repeated at the constructor definition and the call site of the constructor, with many parameters it begins being tiresome. Refactor the list of parameters to a `NamedTuple`.	2022-11-24 16:26:23 +01:00
Kamil Braun	d582666293	test.py: stop/uninstall clusters instead of servers when cleaning up `self.artifacts` was calling `ScyllaServer.stop` and `ScyllaServer.uninstall`. Now it calls `ScyllaCluster.stop` and `ScyllaCluster.uninstall`, which underneath stops/uninstalls servers in this cluster. We must be a bit more careful now in case installing/starting a server inside a cluster fails: there are no server cleanup artifacts, and a server is added to cluster's `running` map only after `install_and_start` finishes (until that happens, `ScyllaCluster.stop/uninstall` won't catch this server). So handle failures explicitly in `install_and_start`. This commit does not logically change how the tests are running - every started server belongs to some cluster, so it will be cleaned up - but it's an important refactor. It will allow us to move IP address (de)allocation code outside `ScyllaServer`, into `ScyllaCluster`, which in turn will allow us to implement node replace operation for the case where we want to reuse the replaced node's IP. Also, `ScyllaCluster.uninstall` was unused before this change, now it's used.	2022-11-24 16:26:17 +01:00
Avi Kivity	29a4b662f8	Merge 'doc: document the Alternator TTL feature as GA' from Anna Stuchlik Currently, TTL is listed as one of the experimental features: https://docs.scylladb.com/stable/alternator/compatibility.html#experimental-api-features This PR moves the feature description from the Experimental Features section to a separate section. I've also added some links and improved the formatting. @tzach I've relied on your release notes for RC1. Refs: https://github.com/scylladb/scylladb/issues/5060 Closes #11997 * github.com:scylladb/scylladb: Update docs/alternator/compatibility.md doc: update the link to Enabling Experimental Features doc: remove the note referring to the previous ScyllaDB versions and add the relevant limitation to the paragraph doc: update the links to the Enabling Experimental Features section doc: add the link to the Enabling Experimental Features section doc: move the TTL Alternator feature from the Experimental Features section to the production-ready section	2022-11-24 17:22:05 +02:00
Nadav Har'El	2dedb5ea75	alternator: make Alternator TTL feature no longer "experimental" Until now, the Alternator TTL feature was considered "experimental", and had to be manually enabled on all nodes of the cluster to be usable. This patch removes this requirement and in essence GAs this feature. Even after this patch, Alternator TTL is still a "cluster feature", i.e., for this feature to be usable every node in the cluster needs to support it. If any of the nodes is old and does not yet support this feature, the UpdateTimeToLive request will not be accepted, so although the expiration-scanning threads may exist on the newer nodes, they will not do anything because none of the tables can be marked as having expiration enabled. This patch does not contain documentation fixes - the documentation still suggests that the Alternator TTL feature is experimental. The documentation patch will come separately. Fixes #12037 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12049	2022-11-24 17:21:39 +02:00
Tzach Livyatan	e96d31d654	docs: Add Authentication and Authorization as a prerequisite for Auditing. Closes #12058	2022-11-24 17:21:23 +02:00
Kamil Braun	df731a5b0c	test/pylib: artifact_registry: replace `Awaitable` type with `Coroutine` The `cleanup_before_exit` method of `ArtifactRegistry` calls `close()` on artifacts. mypy complains that `Awaitable` has no such method. In fact, the `artifact` objects that we pass to `ArtifactRegistry` (obtained by calling `async def` functions) do have a `close()` method, and they are a particular case of `Awaitable`s, but in general not all `Awaitable`s have `close()`. Replace `Awaitable` with one of its subtypes: `Coroutine`. `Coroutine`s have a `close()` method, and `async def` functions return objects of this type. mypy no longer complains.	2022-11-24 16:17:05 +01:00
Nadav Har'El	c6bb64ab0e	Merge 'Fix LWT insert crash if clustering key is null' from Gusev Petr [PR](https://github.com/scylladb/scylladb/pull/9314) fixed a similar issue with regular insert statements but missed the LWT code path. It's expected behaviour of `modification_statement::create_clustering_ranges` to return an empty range in this case, since `possible_lhs_values` it uses explicitly returns `empty_value_set` if it evaluates `rhs` to null, and it has a comment about it (All NULL comparisons fail; no column values match.) On the other hand, all components of the primary key are required to be set, this is checked at the prepare phase, in `modification_statement::process_where_clause`. So the only problem was `modification_statement::execute_with_condition` was not expecting an empty `clustering_range` in case of a null clustering key. Also this patch contains a fix for the problem with wrong column name in Scylla error messages. If `INSERT` or `DELETE` statement is missing a non-last element of the primary key, the error message generated contains an invalid column name. The problem occurs if the query contains a column with the list type, otherwise `statement_restrictions::process_clustering_columns_restrictions` checks that all the components of the key are specified. Closes #12047 * github.com:scylladb/scylladb: cql: refactor, inline modification_statement::validate_primary_key_restrictions cql: DELETE with null value for IN parameter should be forbidden cql: add column name to the error message in case of null primary key component cql: batch statement, inserting a row with a null key column should be forbidden cql: wrong column name in error messages modification_statement: fix LWT insert crash if clustering key is null	2022-11-24 16:15:27 +02:00
Nadav Har'El	6e9f739f19	Merge 'doc: add the links to the per-partition rate limit extension ' from Anna Stuchlik Release 5.1. introduced a new CQL extension that applies to the CREATE TABLE and ALTER TABLE statements. The ScyllaDB-specific extensions are described on a separate page, so the CREATE TABLE and ALTER TABLE should include links to that page and section. Note: CQL extensions are described with Markdown, while the Data Definition page is RST. Currently, there's no way to link from an RST page to an MD subsection (using a section heading or anchor), so a URL is used as a temporary solution. Related: https://github.com/scylladb/scylladb/pull/9810 Closes #12070 * github.com:scylladb/scylladb: doc: move the info about per-partition rate limit for the ALTER TABLE statemet from the paragraph to the list doc: add the links to the per-partition rate limit extention to the CREATE TABLE and ALTER TABLE sections	2022-11-24 16:03:30 +02:00
Anna Stuchlik	8049670772	doc: move the info about per-partition rate limit for the ALTER TABLE statemet from the paragraph to the list	2022-11-24 14:42:11 +01:00
Anna Stuchlik	57a58b17a8	doc: enable publishing the documentation for version 5.1 Closes #12059	2022-11-24 13:55:25 +02:00
Kamil Braun	2f99f27c14	docs/dev: building.md: mention node-exporter packages	2022-11-24 12:49:34 +01:00
Kamil Braun	b12f331fe6	docs/dev: building.md: replace `dev` with `<mode>` in list of debs	2022-11-24 12:47:09 +01:00
Benny Halevy	243dc2efce	hints: host_filter: check topology::has_endpoint if enabled_selectively Don't call get_datacenter(ep) without checking first has_endpoint(ep) since the former may abort on internal error if the endpoint is not listed in topology. Refs #11870 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12054	2022-11-24 14:33:06 +03:00
Anna Stuchlik	f158d31e24	doc: add the links to the per-partition rate limit extention to the CREATE TABLE and ALTER TABLE sections	2022-11-24 11:26:33 +01:00
Petr Gusev	b95305ae2b	cql: refactor, inline modification_statement::validate_primary_key_restrictions The function didn't add much value, just forwarded to _restrictions. Removed it and called _restrictions->validate_primary_key directly.	2022-11-23 21:56:12 +04:00
Petr Gusev	f9936bb0cb	cql: DELETE with null value for IN parameter should be forbidden If a DELETE statement contains an IN operator and the parameter value for it is NULL, this should also trigger an error. This is in line with how Cassandra behaves in this case.	2022-11-23 21:39:23 +04:00
Petr Gusev	c123f94110	cql: add column name to the error message in case of null primary key component It's more user-friendly and the error message corresponds to what Cassandra provides in this case.	2022-11-23 21:39:23 +04:00
Petr Gusev	7730c4718e	cql: batch statement, inserting a row with a null key column should be forbidden Regular INSERT statements with null values for primary key components are rejected by Scylla since #9286 and #9314. Batch statements missed a similar check, this patch fixes it. Fixes: #12060	2022-11-23 21:39:23 +04:00
Petr Gusev	89a5397d7c	cql: wrong column name in error messages If INSERT or DELETE statement is missing a non-last element of the primary key, the error message generated contains an invalid column name. The problem occurs if the query contains a column with the list type, otherwise statement_restrictions::process_clustering_columns_restrictions checks that all the components of the key are specified. Fixes: #12046	2022-11-23 21:39:16 +04:00
Benny Halevy	996eac9569	topology: add get_datacenters Returns an unordered set of datacenter names to be used by network_topology_replication_strategy and for ks_prop_defs. The set is kept in sync with _dc_endpoints. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12023	2022-11-23 18:39:36 +02:00
Takuya ASADA	9acdd3af23	dist: drop deprecated AMI parameters on setup scripts Since we moved all IaaS code to scylla-machine-image, we nolonger need AMI variable on sysconfig file or --ami parameter on setup scripts, and also never used /etc/scylla/ami_disabled. So let's drop all of them from Scylla core core. Related with scylladb/scylla-machine-image#61 Closes #12043	2022-11-23 17:56:13 +02:00
Avi Kivity	7c66fdcad1	Merge 'Simplify sstable_directory configuration' from Pavel Emelyanov When started the sstable_directory is constructed with a bunch of booleans that control the way its process_sstable_dir method works. It's shorter and simpler to pass these booleans into method directly, all the more so there's another flag that's already passed like this. Closes #12005 * github.com:scylladb/scylladb: sstable_directory: Move all RAII booleans onto flags sstable_directory: Convert sort-sstables argument to flags struct sstable_directory: Drop default filter	2022-11-23 16:16:04 +02:00
Avi Kivity	70bfa708f5	storage_proxy: coroutinize change_hints_host_filter() Trivial straight-line code, no performance implications. Closes #12056	2022-11-23 15:34:24 +02:00
Jan Ciolek	84501851eb	cql_pytest: ensure that where clauses like token(p) = 0 AND p = 0 are rejected Scylla doesn't support combining restrictions on token with other restrictions on partition key columns. Some pieces of code depend on the assumption that such combinations are allowed. In case they were allowed in the future these functions would silently start returning wrong results, and we would return invalid rows. Add a test that will start failing once this restriction is removed. It will warn the developer to change the functions that used to depend on the assumption. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 13:09:22 +01:00
Botond Dénes	602dfdaf98	Merge 'Task manager top level repair tasks' from Aleksandra Martyniuk The PR introduces top level repair tasks representing repair and node operations performed with repair. The actions performed as a part of these operations are moved to corresponding tasks' run methods. Also a small change to repair module is added. Closes #11869 * github.com:scylladb/scylladb: repair: define run for data_sync_repair_task_impl repair: add data_sync_repair_task_impl tasks: repair: add noexcept to task impl constructor repair: define run for user_requested_repair_task_impl repair: add user_requested_repair_task_impl repair: allow direct access to max_repair_memory_per_range	2022-11-23 14:02:30 +02:00
Jan Ciolek	338af848a8	cql3: expr: remove needless braces around switch cases Originally put braces around the cases because there were local variables that I didn't want to be shadowed. Now there are no variables so the braces can be removed without any problems. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:30 +01:00
Jan Ciolek	e8a46d34c2	cql3: move evaluation IS_NOT NULL to a separate function When evaluating a binary operation with operations like EQUAL, LESS_THAN, IN the logic of the operation is put in a separate function to keep things clean. IS_NOT NULL is the only exception, it has its evaluate implementation right in the evaluate(binary_operator) function. It would be cleaner to have it in a separate dedicated function, so it's moved to one. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:30 +01:00
Jan Ciolek	b6cf6e6777	expr_test: test evaluating LIKE binary_operator Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:29 +01:00
Jan Ciolek	6774272fd6	expr_test: test evaluating IS_NOT binary_operator Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:29 +01:00
Jan Ciolek	e6c78bb6c2	expr_test: test evaluating CONTAINS_KEY binary_operator Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:29 +01:00
Jan Ciolek	4f250609ab	expr_test: test evaluating CONTAINS binary_operator Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:29 +01:00
Jan Ciolek	3ca04cfcc2	expr_test: test evaluating IN binary_operator Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:28 +01:00
Jan Ciolek	41f452b73f	expr_test: test evaluating GTE binary_operator Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:28 +01:00
Jan Ciolek	1fe9a9ce2a	expr_test: test evaluating GT binary_operator Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:28 +01:00
Jan Ciolek	ef2a77a3e0	expr_test: test evaluating LTE binary_operator Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:28 +01:00
Jan Ciolek	3cbb2d44e8	expr_test: test evaluating LT binary_operator Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:27 +01:00
Jan Ciolek	9feee70710	expr_test: test evaluating NEQ binary_operator Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:27 +01:00
Jan Ciolek	e77dba0b0b	expr_test: test evaluating EQ binary_operator Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:27 +01:00
Jan Ciolek	63a89776a1	cql3: expr properly handle null in is_one_of() Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:27 +01:00
Jan Ciolek	214dab9c77	cql3: expr properly handle null in like() Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:26 +01:00
Jan Ciolek	2ce9c95a9d	cql3: expr properly handle null in contains_key() Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:26 +01:00
Jan Ciolek	336ad61aa3	cql3: expr properly handle null in contains() Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:26 +01:00
Jan Ciolek	e2223be1ec	cql3: expr: properly handle null in limits() Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:26 +01:00
Jan Ciolek	d1abf2e168	cql3: expr: remove unneeded overload of limits() There is a more general version of limits() which takes expressions as both the lhs and rhs arguments. There is no need for a specialized overload. This specialized overload takes a tuple_constructor as lhs, but we call evaluate() on both sides of a binary operator before checking equality, so this won't be useful at all. Having multiple functions increases the risk that one of them has a bug, while giving dubious benfit. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:25 +01:00
Jan Ciolek	0609a425e6	cql3: expr: properly handle null in equality operators Expressions like: 123 = NULL NULL = 123 NULL = NULL NULL != 123 should be tolerated, but evaluate to NULL. The current code assumes that a binary operator can only evaluate to a boolean - true or false. Now a binary operator can also evaluate to NULL. This should happen in cases when one of the operator's sides is NULL. A special class is introduced to represent a value that can be one of three things: true, false or null. It's better than using std::optional<bool>, because optional has implicit conversions to bool that could cause confusion and bugs. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-23 12:44:22 +01:00
Aleksandra Martyniuk	a3016e652f	repair: define run for data_sync_repair_task_impl Operations performed as a part of data sync repair are moved to data_sync_repair_task_impl run method.	2022-11-23 10:44:19 +01:00
Aleksandra Martyniuk	42239c8fed	repair: add data_sync_repair_task_impl Create a task spanning over whole node operation. Tasks of that type are stored on shard 0.	2022-11-23 10:19:53 +01:00
Aleksandra Martyniuk	9e108a2490	tasks: repair: add noexcept to task impl constructor Add noexcept to constructor of tasks::task_manager::task::impl and inheriting classes.	2022-11-23 10:19:53 +01:00
Aleksandra Martyniuk	4a4e9c12df	repair: define run for user_requested_repair_task_impl Operations performed as a part of user requested repair are moved to user_requested_repair_task_impl run method.	2022-11-23 10:19:51 +01:00
Aleksandra Martyniuk	3800b771fc	repair: add user_requested_repair_task_impl Create a task spanning over whole user requested repair. Tasks of that type are stored on shard 0.	2022-11-23 10:11:09 +01:00
Aleksandra Martyniuk	0256ede089	repair: allow direct access to max_repair_memory_per_range Access specifier of constexpr value max_repair_memory_per_range in repair_module is changed to public and its getter is deleted.	2022-11-23 10:11:09 +01:00
Anna Stuchlik	16e2b9acd4	Update docs/alternator/compatibility.md Co-authored-by: Daniel Lohse <info@asapdesign.de>	2022-11-23 09:51:04 +01:00
Avi Kivity	d7310fd083	gdb: messaging: print tls servers too Many systems have most traffic on tls servers, so print them. Closes #12053	2022-11-23 07:59:02 +02:00
Avi Kivity	aec9faddb1	Merge 'storage_proxy: use erm topology' from Benny Halevy When processing a query, we keep a pointer to an effective_replication_map. In a couple places we used the latest topology instead of the one held by the effective_replication_map that the query uses and that might lead to inconsistencies if, for example, a node is removed from topology after decommission that happens concurrently to the query. This change gets the topology& from the e_r_m in those cases. Fixes #12050 Closes #12051 * github.com:scylladb/scylladb: storage_proxy: pass topology& to sort_endpoints_by_proximity storage_proxy: pass topology& to is_worth_merging_for_range_query	2022-11-22 20:04:41 +02:00
Botond Dénes	49ec7caf27	mutation_fragment_stream_validator: avoid allocation when stream is correct Currently the ctor of said class always allocates as it copies the provided name string and it creates a new name via format(). We want to avoid this, now that the validator is used on the read path. So defer creating the formatted name to when we actually want to log something, which is either when log level is debug or when an error is found. We don't care about performance in either case, but we do care about it on the happy path. Further to the above, provide a constructor for string literal names and when this is used, don't copy the name string, just save a view to it. Refs: #11174 Closes #12042	2022-11-22 19:19:18 +02:00
Nadav Har'El	ce7c1a6c52	Merge 'alternator: fix wrong 'where' condition for GSI range key' from Marcin Maliszkiewicz Contains fixes requested in the issue (and some tiny extras), together with analysis why they don't affect the users (see commit messages). Fixes [ #11800](https://github.com/scylladb/scylladb/issues/11800) Closes #11926 * github.com:scylladb/scylladb: alternator: add maybe_quote to secondary indexes 'where' condition test/alternator: correct xfail reason for test_gsi_backfill_empty_string test/alternator: correct indentation in test_lsi_describe alternator: fix wrong 'where' condition for GSI range key	2022-11-22 17:46:52 +02:00
Pavel Emelyanov	22133a3949	sstable_directory: Move all RAII booleans onto flags There's a bunch of booleans that control the behavior of sstable directory scanning. Currently they are described as verbose bool_class<>-es and are put into sstable_directory construction time. However, these are not used outside of .process_sstable_dir() method and moving them onto recently added flags struct makes the code much shorter (29 insertions(+), 121 deletions(-)) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-22 18:30:00 +03:00
Pavel Emelyanov	7ca5e143d7	sstable_directory: Convert sort-sstables argument to flags struct The sstable_directory::process_sstable_dir() accepts a boolean to control its behavior when collecting sstables. Turn this boolean into a structure of flags. The intention is to extend this flags set in the future (next patch). This boolean is true all the time, but one place sets it to true in a "verbose" manner, like this: bool sort_sstables_according_to_owner = false; process_sstable_dir(directory, sort_sstables_according_to_owner).get(); the local variable is not used anymore. Using designated initializers solves the verbosity in a nicer manner. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-22 18:19:23 +03:00
Pavel Emelyanov	7c7017d726	sstable_directory: Drop default filter It's used as default argument for .reshape() method, but callers specify it explicitly. At the same time the filter is simple enough and is only used in one place so that the caller can just use explicit lambda. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-22 18:19:23 +03:00
Jan Ciolek	6be142e3a0	cql3: expr: remove unneeded overload of equal() There is a more general version of equal() which takes expressions as both the lhs and rhs arguments. There is no need for a specialized overload. This specialized overload takes a tuple_constructor as lhs, but we call evaluate() on both sides of a binary operator before checking equality, so this won't be useful at all. Having multiple functions increases the risk that one of them has a bug, while giving dubious benfit. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-22 14:28:10 +01:00
Benny Halevy	731a74c71f	storage_proxy: pass topology& to sort_endpoints_by_proximity It mustn't use the latest topology that may differ from the one used by the query as it may be missing nodes (e.g. after concurrent decommission). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-22 15:02:40 +02:00
Benny Halevy	ab3fc1e069	storage_proxy: pass topology& to is_worth_merging_for_range_query It mustn't use the latest topology that may differ from the one used by the query as it may be missing nodes (e.g. after concurrent decommission). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-22 15:01:58 +02:00
Petr Gusev	0d443dfd16	modification_statement: fix LWT insert crash if clustering key is null PR #9314 fixed a similar issue with regular insert statements but missed the LWT code path. It's expected behaviour of modification_statement::create_clustering_ranges to return an empty range in this case, since possible_lhs_values it uses explicitly returns empty_value_set if it evaluates rhs to null, and it has a comment about it (All NULL comparisons fail; no column values match.) On the other hand, all components of the primary key are required to be set, this is checked at the prepare phase, in modification_statement::process_where_clause. So the only problem was modification_statement::execute_with_condition was not expecting an empty clustering_range in case of a null clustering key. Fixes: #11954	2022-11-22 16:45:16 +04:00
Marcin Maliszkiewicz	2bf2ffd3ed	alternator: add maybe_quote to secondary indexes 'where' condition This bug doesn't affect anything, the reason is descibed in the commit: 'alternator: fix wrong 'where' condition for GSI range key'. But it's theoretically correct to escape those key names and the difference can be observed via CQL's describe table. Before the patch 'where' condition is missing one double quote in variable name making it mismatched with corresponding column name.	2022-11-22 11:08:23 +01:00
Marcin Maliszkiewicz	4389baf0d9	test/alternator: correct xfail reason for test_gsi_backfill_empty_string Previously cited issue is closed already.	2022-11-22 11:08:23 +01:00
Marcin Maliszkiewicz	59eca20af1	test/alternator: correct indentation in test_lsi_describe Otherwise I think assert is not executed in a loop. And I am not sure why lsi variable can be bound to anything. As I tested it was pointing to the last element in lsis...	2022-11-22 11:08:23 +01:00
Marcin Maliszkiewicz	d6d20134de	alternator: fix wrong 'where' condition for GSI range key This bug doesn't manifest in a visible way to the user. Adding the index to an existing table via GlobalSecondaryIndexUpdates is not supported so we don't need to consider what could happen for empty values of index range key. After the index is added the only interesting value user can set is omitting the value (null or empty are not allowed, see test_gsi_empty_value and test_gsi_null_value). In practice no matter of 'where' condition the underlaying materialized view code is skipping row updates with missing keys as per this comment: 'If one of the key columns is missing, set has_new_row = false meaning that after the update there will be no view row'. Thats why the added test passes both before and after the patch. But it's still usefull to include it to exercise those code paths. Fixes #11800	2022-11-22 11:08:23 +01:00
Nadav Har'El	ff617c6950	cql-pytest: translate a few small Cassandra tests This patch includes a translation of several additional small test files from Cassandra's CQL unit test directory cql3/validation/operations. All tests included here pass on both Cassandra and Scylla, so they did not discover any new Scylla bugs, but can be useful in the future as regression tests. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12045	2022-11-22 07:54:13 +02:00
Botond Dénes	f3eecb47f6	Merge 'Optimize cleanup compaction get ranges for invalidation' from Benny Halevy Take advantage of the facts that both the owned ranges and the initial non_owned_ranges (derived from the set of sstables) are deoverlapped and sorted by start token to turn the calculation of the final non_owned_ranges from quadratic to linear. Fixes #11922 Closes #11903 * github.com:scylladb/scylladb: dht: optimize subtract_ranges compaction: refactor dht::subtract_ranges out of get_ranges_for_invalidation compaction_manager: needs_cleanup: get first/last tokens from sstable decorated keys	2022-11-22 06:45:01 +02:00
Jan Ciolek	a1407ef576	cql3: expr: use evaluate(binary_operator) in is_satisfied_by is_satisfied_by has to check if a binary_operator is satisfied by some values. It used to be impossible to evaluate a binary_operator, so is_satisfied had code to check if its satisfied for a limited number of cases occuring when filtering queries. Now evaluate(binary_operator) has been implemented and is_satisfied_by can use it to check if a binary_operator evaluates to true. This is cleaner and reduces code duplication. Additionally cql tests will test the new evalute() implementation. There is one special case with token(). When is_satisfied_by sees a restriction on token it assumes that it's satisfied because it's sure that these token restrictions were used to generate partition ranges. I had to leave this special case in because it's impossible to evaluate(token). Once this is implemented I will remove the special case because it's risky and prone to cause bugs. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-21 20:40:06 +01:00
Jan Ciolek	9c4889ecc3	cql3: expr: handle IS NOT NULL when evaluating binary_operator The code to evaluate binary operators was copied from is_satisfied_by. is_satisfied_by wasn't able to evaluate IS NOT NULL restrictions, so when such restriction is encountered it throws an exception. Implement proper handling for IS NOT NULL binary operators. The switch ensures that all variants of oper_t are handled, otherwise there would be a compilation error. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-21 20:40:00 +01:00
Avi Kivity	bf2e54ff85	Merge 'Move deletion log code to sstable_directory.cc' from Pavel Emelyanov In order to support different storage kinds for sstable files (e.g. -- s3) it's needed to localize all the places that manipulate files on a POSIX filesystem so that custom storage could implement them in its own way. This set moves the deletion log manipulations to the sstable_directory.cc, which already "knows" that it works over a directory. Closes #12020 * github.com:scylladb/scylladb: sstables: Delete log file in replay_pending_delete_log() sstables: Move deletion log manipulations to sstable_directory.cc sstables: Open-code delete_sstables() call sstables: Use fs::path in replay_pending_delete_log() sstables: Indentation fix after previous patch sstables: Coroutinize replay_pending_delete_log sstables: Read pending delete log with one line helper sstables: Dont write pending log with file_writer	2022-11-21 21:22:59 +02:00
Jan Ciolek	b4cc92216b	cql3: expr: make it possible to evaluate binary_operator evaluate() takes an expression and evaluates it to a constant value. It wasn't possible to evalute binary operators before, so it's added. The code is based on is_satisfied_by, which is currently used to check whether a binary operator evaluates to true or false. It looks like is_satisfied_by and evalate() do pretty much the same thing, one could be implemented using the other. In the future they might get merged into a single function. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-21 17:48:23 +01:00
Jan Ciolek	8d81eaa68f	cql3: expr: accept expression as lhs argument to like() like() used to only accept column_value as the lhs to evaluate. Changed it to accept any generic expression. This will allow to evaluate a more diverse set of binary operators. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-21 16:33:18 +01:00
Jan Ciolek	b1a12686dc	cql3: expr: accept expression as lhs in contains_key contains_key() used to only accept column_value as the lhs to evaluate. Changed it to accept any generic expression. This will allow to evaluate a more diverse set of binary operators. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-21 16:33:02 +01:00
Jan Ciolek	79cd9cd956	cql3: expr: accept expression as lhs argument to contains() contains() used to only accept column_value as the lhs to evaluate. Changed it to accept any generic expression. This will allow to evaluate a more diverse set of binary operators. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-21 16:32:44 +01:00
Benny Halevy	57ff3f240f	dht: optimize subtract_ranges Take advantage of the fact that both ranges and ranges_to_subtract are deoverlapped and sorted by to reduce the calculation complexity from quadratic to linear. Fixes #11922 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-21 15:48:28 +02:00
Benny Halevy	8b81635d95	compaction: refactor dht::subtract_ranges out of get_ranges_for_invalidation The algorithm is generic and can be used elsewhere. Add a unit test for the function before it gets optimized in the following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-21 15:48:26 +02:00
Benny Halevy	7c6f60ae72	compaction_manager: needs_cleanup: get first/last tokens from sstable decorated keys Currently, the function is inefficient in two ways: 1. unnecessary copy of first/last keys to automatic variables 2. redecorating the partition keys with the schema passed to needs_cleanup. We canjust use the tokens from the sstable first/last decorated keys. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-21 15:44:32 +02:00
Pavel Emelyanov	2f9b7931af	sstables: Delete log file in replay_pending_delete_log() It's natural that the replayer cleans up after itself Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-21 13:16:22 +03:00
Pavel Emelyanov	bdc47b7717	sstables: Move deletion log manipulations to sstable_directory.cc The deletion log concept uses the fact that files are on a POSIX filesystem. Support for another storage type will have to reimplement this place, so keep the FS-specific code in _directory.cc file. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-21 13:16:21 +03:00
Pavel Emelyanov	865c51c6cf	sstables: Open-code delete_sstables() call It's no used by any other code, but to be used it requires the caller to tranform TOC file names by prepending sstable directory to them. Things get shorter and simpler if merging the helper code into the caller. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-21 13:15:25 +03:00
Pavel Emelyanov	a61c96a627	sstables: Use fs::path in replay_pending_delete_log() It's called by a code that has fs::path at hand and internally uses helpers that need fs::path too, so no need to convert it back and forth. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-21 13:15:25 +03:00
Pavel Emelyanov	f5684bcaf0	sstables: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-21 13:15:25 +03:00
Pavel Emelyanov	85a73ca9c6	sstables: Coroutinize replay_pending_delete_log Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-21 13:15:25 +03:00
Pavel Emelyanov	6f3fd94162	sstables: Read pending delete log with one line helper There's one in seastar since recently Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-21 13:15:25 +03:00
Pavel Emelyanov	2dedf4d03a	sstables: Dont write pending log with file_writer It's a wrapper over output_stream with offset tracking and the tracking is not needed to generate a log file. As a bonus of switching back we get a stream.write(sstring) sugar. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-21 13:15:24 +03:00
Botond Dénes	2d4439a739	Merge 'doc: add a troubleshooting article about the missing configuration files' from Anna Stuchlik Fix https://github.com/scylladb/scylladb/issues/11598 This PR adds the troubleshooting article submitted by @syuu1228 in the deprecated _scylla-docs_ repo, with https://github.com/scylladb/scylla-docs/pull/4152. I copied and reorganized the content and rewritten it a little according to the RST guidelines so that the page renders correctly. @syuu1228 Could you review this PR to make sure that my changes didn't distort the original meaning? Closes #11626 * github.com:scylladb/scylladb: doc: apply the feedback to improve clarity doc: add the link to the new Troubleshooting section and replace Scylla with ScyllaDB doc: add the new page to the toctree doc: add a troubleshooting article about the missing configuration files	2022-11-21 12:02:31 +02:00
Kamil Braun	135eb4a041	test.py: prepare for adding extra config from test when creating servers We will use this for replace operations to pass the IP of replaced node.	2022-11-21 10:57:03 +01:00
Kamil Braun	ac91e9d8be	test/pylib: manager_client: convert `add_server` to use `put_json` We shall soon pass some JSON data into these requests.	2022-11-21 10:57:03 +01:00
Kamil Braun	82eb9af80d	test/pylib: rest_client: allow returning JSON data from `put_json` We'll use `put_json` for requests which want to pass JSON data into the call and also return JSON.	2022-11-21 10:57:03 +01:00
Kamil Braun	4fef2d099b	test/pylib: scylla_cluster: don't import from manager_client There's a logical dependency from `manager_client` to `scylla_cluster` (`ManagerClient` defined in `manager_client` talks to `ScyllaClusterManager` defined in `scylla_cluster` over RPC). There is no such dependency in the other way. Do not introduce it accidentally. We can import these types from the `internal_types` module.	2022-11-21 10:57:03 +01:00
Nadav Har'El	757d2a4c02	test/alternator: un-xfail a test which passes on modern Python We had an xfailing test that reproduced a case where Alternator tried to report an error when the request was too long, but the boto library didn't see this error and threw a "Broken Pipe" error instead. It turns out that this wasn't a Scylla bug but rather a bug in urllib3, which overzealously reported a "Broken Pipe" instead of trying to read the server's response. It turns out this issue was already fixed in https://github.com/urllib3/urllib3/pull/1524 and now, on modern installations, the test that used to fail now passes and reports "XPASS". So in this patch we remove the "xfail" tag, and skip the test if running an old version of urllib3. Fixes #8195 Closes #12038	2022-11-21 08:10:10 +02:00
Botond Dénes	ffc3697f2f	Merge 'storage_service api: handle dropped tables' from Benny Halevy Gracefully skip tables that were removed in the background. Fixes #12007 Closes #12013 * github.com:scylladb/scylladb: api: storage_service: fixup indentation api: storage_service: add run_on_existing_tables api: storage_service: add parse_table_infos api: storage_service: log errors from compaction related handlers api: storage_service: coroutinize compaction related handlers	2022-11-21 07:56:27 +02:00
Avi Kivity	994603171b	Merge 'Add validator to the mutation compactor' from Botond Dénes Fragment reordering and fragment dropping bugs have been plaguing us since forever. To fight them we added a validator to the sstable write path to prevent really messed up sstables from being written. This series adds validation to the mutation compactor. This will cover reads and compaction among others, hopefully ridding us of such bugs on the read path too. This series fixes some benign looking issues found by unit tests after the validator was added -- although how benign a producer emitting two partition-ends depends entirely on how the consumer reacts to it, so no such bug is actually benign. Fixes: https://github.com/scylladb/scylladb/issues/11174 Closes #11532 * github.com:scylladb/scylladb: mutation_compactor: add validator mutation_fragment_stream_validator: add a 'none' validation level test/boost/mutation_query_test: test_partition_limit: sort input data querier: consume_page(): use partition_start as the sentinel value treewide: use ::for_partition_end() instead of ::end_of_partition_tag_t{} treewide: use ::for_partition_start() instead of ::partition_start_tag_t{} position_in_partition: add for_partition_{start,end}()	2022-11-20 20:33:26 +02:00
Avi Kivity	779b01106d	Merge 'cql3: expr: add unit tests for prepare_expression' from Jan Ciołek Adds unit tests for the function `expr::prepare_expression`. Three minor bugs were found by these tests, both fixed in this PR. 1. When preparing a map, the type for tuple constructor was taken from an unprepared tuple, which has `nullptr` as its type. 2. Preparing an empty nonfrozen list or set resulted in `null`, but preparing a map didn't. Fixed this inconsistency. 3. Preparing a `bind_variable` with `nullptr` receiver was allowed. The `bind_variable` ended up with a `nullptr` type, which is incorrect. Changed it to throw an exception, Closes #11941 * github.com:scylladb/scylladb: test preparing expr::usertype_constructor expr_test: test that prepare_expression checks style_type of collection_constructor expr_test: test preparing expr::collection_constructor for map prepare_expr: make preparing nonfrozen empty maps return null prepare_expr: fix a bug in map_prepare_expression expr_test: test preparing expr::collection_constructor for set expr_test: test preparing expr::collection_constructor for list expr_test: test preparing expr::tuple_constructor expr_test: test preparing expr::untyped_constant expr_test_utils: add make_bigint_raw/const expr_test_utils: add make_tinyint_raw/const expr_test: test preparing expr::bind_variable cql3: prepare_expr: forbid preparing bind_variable without a receiver expr_test: test preparing expr::null expr_test: test preparing expr::cast expr_test_utils: add make_receiver expr_test_utils: add make_smallint_raw/const expr_test: test preparing expr::token expr_test: test preparing expr::subscript expr_test: test preparing expr::column_value expr_test: test preparing expr::unresolved_identifier expr_test_utils: mock data_dictionary::database	2022-11-20 20:03:54 +02:00
Nadav Har'El	2ba8b8d625	test/cql-pytest: remove "xfail" from passing test testIndexOnFrozenCollectionOfUDT We had a test that used to fail because of issue #8745. But this issue was alread fixed, and we forgot to remove the "xfail" marker. The test now passes, so let's remove the xfail marker. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12039	2022-11-20 19:54:59 +02:00
Avi Kivity	40f61db120	Merge 'docs: describe the Raft upgrade and recovery procedures' from Kamil Braun Add new guide for upgrading 5.1 to 5.2. In this new upgrade doc, include additional steps for enabling Raft using the `consistent_cluster_management` flag. Note that we don't have this flag yet but it's planned to replace the experimental flag in 5.2. In the "Raft in ScyllaDB" document, add sections about: - enabling Raft in existing clusters in Scylla 5.2, - verifying that the internal Raft upgrade procedure finishes successfully, - recovering from a stuck Raft upgrade procedure or from a majority loss situation. Fix some problems in the documentation, e.g. it is not possible to enable Raft in an existing cluster in 5.0, but the documentation claimed that it is. Follow-up items: - if we decide for a different name for `consistent_cluster_management`, use that name in the docs instead - update the warnings in Scylla to link to the Raft doc - mention Enterprise versions once we know the numbers - update the appropriate upgrade docs for Enterprise versions once they exist Closes #11910 * github.com:scylladb/scylladb: docs: describe the Raft upgrade and recovery procedures docs: add upgrade guide 5.1 -> 5.2	2022-11-20 19:00:23 +02:00
Avi Kivity	15ee8cfc05	Merge 'reader_concurrency_semaphore: fix waiter/inactive race' from Botond Dénes We recently (in `7fbad8de87`) made sure all admission paths can trigger the eviction of inactive reads. As reader eviction happens in the background, a mechanism was added to make sure only a single eviction fiber was running at any given time. This mechanism however had a preemption point between stopping the fiber and releasing the evict lock. This gave an opportunity for either new waiters or inactive readers to be added, without the fiber acting on it. Since it still held onto the lock, it also prevented from other eviction fibers to start. This could create a situation where the semaphore could admit new reads by evicting inactive ones, but it still has waiters. Since an empty waitlist is also an admission criteria, once one waiter is wrongly added, many more can accumulate. This series fixes this by ensuring the lock is released in the instant the fiber decides there is no more work to do. It also fixes the assert failure on recursive eviction and adds a detection to the inactive/waiter contradiction. Fixes: #11923 Refs: #11770 Closes #12026 * github.com:scylladb/scylladb: reader_concurrency_semaphore: do_wait_admission(): detect admission-waiter anomaly reader_concurrency_semaphore: evict_readers_in_the_background(): eliminate blind spot reader_concurrency_semaphore: do_detach_inactive_read(): do a complete detach	2022-11-20 18:51:34 +02:00
Avi Kivity	895d721d5e	Merge 'scylla-sstable: data-dump improvements' from Botond Dénes This series contains a mixed bag of improvements to `scylla sstable dump-data`. These improvements are mostly aimed at making the json output clearer, getting rid of any ambiguities. Closes #12030 * github.com:scylladb/scylladb: tools/scylla-sstable: traverse sstables in argument order tools/scylla-sstable: dump-data docs: s/clustering_fragments/clustering_elements tools/scylla-sstable: dump-data/json: use Null instead of "<unknown>" tools/scylla-sstable: dump-data/json: use more uniform format for collections tools/scylla-sstable: dump-data/json: make cells easier to parse	2022-11-20 17:02:27 +02:00
Avi Kivity	2f9c53fbe4	Merge 'test/pylib: scylla_cluster: use server ID to name workdir and log file, not IP address' from Kamil Braun Since recently the framework uses a separate set of unique IDs to identify servers, but the log file and workdir is still named using the last part of the IP address. This is confusing: the test logs sometimes don't provide the IP addr (only the ID), and even if they do, the reader of the test log may not know that they need to look at the last part of the IP to find the node's log/workdir. Also using ID will be necessary if we want to reuse IP addresses (e.g. during node replace, or simply not to run out of IP addresses during testing). So use the ID instead to name the workdir and log file. Also, when starting a test case, print the used cluster. This will make it easier to map server IDs to their IP addresses when browsing through the test logs. Closes #12018 * github.com:scylladb/scylladb: test/pylib: manager_client: print used cluster when starting test case test/pylib: scylla_cluster: use server ID to name workdir and log file, not IP address	2022-11-20 16:56:19 +02:00
Avi Kivity	14218d82d6	Update tools/java submodule (serverless) * tools/java caf754f243...874e2d529b (2): > Add Scylla Cloud serverless support > Switch cqlsh to use scylla-driver	2022-11-20 16:41:36 +02:00
Tomasz Grabiec	c8e983b4aa	test: flat_mutation_reader_assertions: Use fatal BOOST_REQUIRE_EQUAL instead of BOOST_CHECK_EQUAL BOOST_CHECK_EQUAL is a weaker form of assertion, it reports an error and will cause the test case to fail but continues. This makes the test harder to debug because there's no obvious way to catch the failure in GDB and the test output is also flooded with things which happen after the failed assertion. Message-Id: <20221119171855.2240225-1-tgrabiec@scylladb.com>	2022-11-20 16:14:26 +02:00
Nadav Har'El	2d2034ea28	Merge 'cql3: don't ignore other restrictions when a multi column restriction is present during filtering' from Jan Ciołek When filtering with multi column restriction present all other restrictions were ignored. So a query like: `SELECT * FROM WHERE pk = 0 AND (ck1, ck2) < (0, 0) AND regular_col = 0 ALLOW FILTERING;` would ignore the restriction `regular_col = 0`. This was caused by a bug in the filtering code: `2779a171fc/cql3/selection/selection.cc (L433-L449)` When multi column restrictions were detected, the code checked if they are satisfied and returned immediately. This is fixed by returning only when these restrictions are not satisfied. When they are satisfied the other restrictions are checked as well to ensure all of them are satisfied. This code was introduced back in 2019, when fixing #3574. Perhaps back then it was impossible to mix multi column and regular columns and this approach was correct. Fixes: #6200 Fixes: #12014 Closes #12031 * github.com:scylladb/scylladb: cql-pytest: add a reproducer for #12014, verify that filtering multi column and regular restrictions works boost/restrictions-test: uncomment part of the test that passes now cql-pytest: enable test for filtering combined multi column and regular column restrictions cql3: don't ignore other restrictions when a multi column restriction is present during filtering	2022-11-20 11:50:38 +02:00
Benny Halevy	ec5707a4a8	api: storage_service: fixup indentation	2022-11-20 09:14:45 +02:00
Benny Halevy	cc63719782	api: storage_service: add run_on_existing_tables Gracefully skip tables that were removed in the background. Fixes #12007 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-20 09:14:29 +02:00
Benny Halevy	9ef9b9d1d9	api: storage_service: add parse_table_infos The table UUIDs are the same on all shards so we might as well get them on shard 0 (as we already do) and reuse them on other shards. It is more efficient and accurate to lookup the table eventually on the shard using its uuid rather than its name. If the table was dropped and recreated using the same name in the background, the new table will have a new uuid and do the api function does not apply to it anymore. A following change will handle the no_such_column_family cases. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-20 09:14:21 +02:00
Benny Halevy	9b4a9b2772	api: storage_service: log errors from compaction related handlers Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-20 09:03:25 +02:00
Benny Halevy	a47f96bc05	api: storage_service: coroutinize compaction related handlers Before we improve parsing tables lists and handling of no_such_column_family errors. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-20 09:03:25 +02:00
Jan Ciolek	286f182a8c	cql-pytest: add a reproducer for #12014 , verify that filtering multi column and regular restrictions works In issue #12014 a user has encountered an instance of #6200. When filtering a WHERE clause which contained both multi-column and regular restrictions, the regular restrictions were ignored. Add a test which reproduces the issue using a reproducer provided by the user. This problem is tested in another similar test, but this one reproduces the issue in the exact way it was found by the user. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-18 15:27:42 +01:00
Jan Ciolek	63fb2612c3	boost/restrictions-test: uncomment part of the test that passes now A part of the test was commented out due to #6200. Now #6200 has been fixed and it can be uncommented. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-18 15:14:32 +01:00
Jan Ciolek	99e1032e34	cql-pytest: enable test for filtering combined multi column and regular column restrictions The test test_multi_column_restrictions_and_filtering was marked as xfail, because issue #6200 wasn't fixed. Now that filtering multi column and other restrictions together has been fixed the test passes. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-18 15:14:32 +01:00
Jan Ciolek	b974d4adfb	cql3: don't ignore other restrictions when a multi column restriction is present during filtering When filtering with multi column restriction present all other restrictions were ignored. So a query like: `SELECT * FROM WHERE pk = 0 AND (ck1, ck2) < (0, 0) AND regular_col = 0 ALLOW FILTERING;` would ignore the restriction `regular_col = 0`. This was caused by a bug in the filtering code: `2779a171fc/cql3/selection/selection.cc (L433-L449)` When multi column restrictions were detected, the code checked if they are satisfied and returned immediately. This is fixed by returning only when these restrictions are not satisfied. When they are satisfied the other restrictions are checked as well to ensure all of them are satisfied. This code was introduced back in 2019, when fixing #3574. Perhaps back then it was impossible to mix multi column and regular columns and this approach was correct. Fixes: #6200 Fixes: #12014 Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-18 15:14:16 +01:00
Botond Dénes	30597f17ed	tools/scylla-sstable: traverse sstables in argument order In the order the user passed them on the command-line.	2022-11-18 15:58:37 +02:00
Botond Dénes	e337b25aa9	tools/scylla-sstable: dump-data docs: s/clustering_fragments/clustering_elements The usage of clustering_fragments is a typo, the output contains clustering_elements.	2022-11-18 15:58:36 +02:00
Botond Dénes	c39408b394	tools/scylla-sstable: dump-data/json: use Null instead of "<unknown>" The currently used "<unknown>" marker for invalid values/types is undistinguishable from a normal value in some cases. Use the much more distinct and unique json Null instead.	2022-11-18 15:58:36 +02:00
Botond Dénes	1dfceb5716	tools/scylla-sstable: dump-data/json: use more uniform format for collections Instead of trying to be clever and switching the output on the type of collection, use the same format always: a list of objects, where the object has a key and value attribute, containing to the respective collection item key and values. This makes processing much easier for machines (and humans too since the previous system wasn't working well).	2022-11-18 15:58:36 +02:00
Botond Dénes	f89acc8df7	tools/scylla-sstable: dump-data/json: make cells easier to parse There are several slightly different cell types in scylla: regular cells, collection cells (frozen and non-frozen) and counter cells (update and shards). In C++ code the type of the cell is always available for code wishing to make out exactly what kind of cell a cell is. In the JSON output of the dump-data this is currently really hard to do as there is not enough information to disambiguate all the different cell types. We wish to make the JSON output self-sufficient so in this patch we introduce a "type" field which contains one of: * regular * counter-update * counter-shards * frozen-collection * collection Furthermore, we bring the different types closer by also printing the counter shards under the 'value' key, not under the 'shards' key as before. The separate 'shards' is no longer needed to disambiguate. The documentation and the write operation is also updated to reflect the changes.	2022-11-18 15:58:36 +02:00
Petr Gusev	41629e97de	test.py: handle --markers parameter Some tests may take longer than a few seconds to run. We want to mark such tests in some way, so that we can run them selectively. This patch proposes to use pytest markers for this. The markers from the test.py command line are passed to pytest as is via the -m parameter. By default, the marker filter is not applied and all tests will be run without exception. To exclude e.g. slow tests you can write --markers 'not slow'. The --markers parameter is currently only supported by Python tests, other tests ignore it. We intend to support this parameter for other types of tests in the future. Another possible improvement is not to run suites for which all tests have been filtered out by markers. The markers are currently handled by pytest, which means that the logic in test.py (e.g., running a scylla test cluster) will be run for such suites. Closes #11713	2022-11-18 12:36:20 +01:00
Avi Kivity	7da12c64bc	Revert "Revert "Merge 'cql3: select_statement: coroutinize indexed_table_select_statement::do_execute_base_query()' from Avi Kivity"" This reverts commit `22f13e7ca3`, and reinstates commit `df8e1da8b2` ("Merge 'cql3: select_statement: coroutinize indexed_table_select_statement::do_execute_base_query()' from Avi Kivity"). The original commit was reverted due to failures in debug mode on aarch64, but after commit `224a2877b9` ("build: disable -Og in debug mode to avoid coroutine asan breakage"), it works again. Closes #12021	2022-11-18 12:44:00 +02:00
Kamil Braun	d7649a86c4	Merge 'Build up to support of dynamic IP address changes in Raft' from Konstantin Osipov We plan to stop storing IP addresses in Raft configuration, and instead use the information disseminated through gossip to locate Raft peers. Implement patches that are building up to that: * improve Raft API of configuration change notifications * disseminate raft host id in Gossip * avoid using Raft addresses from Raft configuraiton, and instead consistently use the translation layer between raft server id <-> IP address Closes #11953 * github.com:scylladb/scylladb: raft: persist the initial raft address map raft: (upgrade) do not use IP addresses from Raft config raft: (and gossip) begin gossiping raft server ids raft: change the API of conf change notifications	2022-11-18 11:38:19 +01:00
Botond Dénes	437fcdeeda	Merge 'Make use of enum_set in directory lister' from Pavel Emelyanov The lister accepts sort of a filter -- what kind of entries to list, regular, directories or both. It currently uses unordered_set, but enum_set is shorter and better describes the intent. Closes #12017 * github.com:scylladb/scylladb: lister: Make lister::dir_entry_types an enum_set database: Avoid useless local variable	2022-11-18 12:15:26 +02:00
Botond Dénes	b39ca29b3c	reader_concurrency_semaphore: do_wait_admission(): detect admission-waiter anomaly The semaphore should admit readers as soon as it can. So at any point in time there should be either no waiters, or the semaphore shouldn't be able to admit new reads. Otherwise something went wrong. Detect this when queuing up reads and dump the diagnostics if detected. Even though tests should ensure this should never happen, recently we've seen a race between eviction and enqueuing producing such situations. This is very hard to write tests for, so add built-in detection and protection instead. Detecting this is very cheap anyway.	2022-11-18 11:35:47 +02:00
Botond Dénes	ca7014ddb8	reader_concurrency_semaphore: evict_readers_in_the_background(): eliminate blind spot Said method has a protection against concurrent (recursive more like) calls to itself, by setting a flag `_evicting` and returning early if this flag is set. The evicting loop however has at least one preemption point between deciding there is nothing more to evict and resetting said flag. This window provides opporunity for new inactive reads or waiters to be queued without this loop noticing, while denying any other concurrent invocations at that time from reacting too. Eliminate this by using repeat() instead of do_until() and setting `_evicting = false` the moment the loop's run condition becomes false.	2022-11-18 11:35:47 +02:00
Botond Dénes	892f52c683	reader_concurrency_semaphore: do_detach_inactive_read(): do a complete detach Currently this method detaches the inactive read from the handle and notifies the permit, calls the notify handler if any and does some stat bookkeeping. Extend it to do a complete detach: unlink the entry from the inactive reads list and also cancel the ttl timer. After this, all that is left to the caller is to destroy the entry. This will prevent any recursive eviction from causing assertion failure. Although recursive eviction shouldn't happen, it shouldn't trigger an assert.	2022-11-18 11:35:43 +02:00
Pavel Emelyanov	a44ca06906	Merge 'token_metadata: Do not use topology info for is_member check' from Asias He Since commit `a980f94` (token_metadata: impl: keep the set of normal token owners as a member), we have a set, _normal_token_owners, which contains all the nodes in the ring. We can use _normal_token_owners to check if a node is part of the ring directly instead of going through the _toplogy indirectly. Fixes #11935 Closes #11936 * github.com:scylladb/scylladb: token_metadata: Rename is_member to is_normal_token_owner token_metadata: Add docs for is_member token_metadata: Do not use topology info for is_member check token_metadata: Check node is part of the topology instead of the ring	2022-11-18 11:54:07 +03:00
Asias He	4571fcf9e7	token_metadata: Rename is_member to is_normal_token_owner The name is_normal_token_owner is more clear than is_member. The is_normal_token_owner reflects what it really checks.	2022-11-18 09:29:20 +08:00
Asias He	965097cde5	token_metadata: Add docs for is_member Make it clear, is_member checks if a node is part of the token ring and checks nothing else.	2022-11-18 09:28:56 +08:00
Asias He	a495b71858	token_metadata: Do not use topology info for is_member check Since commit `a980f94` (token_metadata: impl: keep the set of normal token owners as a member), we have a set, _normal_token_owners, which contains all the nodes in the ring. We can use _normal_token_owners to check if a node is part of the ring directly instead of going through the _toplogy indirectly. Fixes #11935	2022-11-18 09:28:56 +08:00
Asias He	f2ca790883	token_metadata: Check node is part of the topology instead of the ring update_normal_tokens is the way to add a new node into the ring. We should not require a new node to already be in the ring to be able to add it to the ring. The current code works accidentally because is_member is checking if a node is in the topology We should use _topology.has_endpoint to check if a node is part of the topology explicitly.	2022-11-18 09:28:56 +08:00
Jan Ciolek	77d68153f1	test preparing expr::usertype_constructor Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 20:41:10 +01:00
Jan Ciolek	eb92fb4289	expr_test: test that prepare_expression checks style_type of collection_constructor Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 20:41:10 +01:00
Jan Ciolek	77c63a6b92	expr_test: test preparing expr::collection_constructor for map Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 20:41:09 +01:00
Jan Ciolek	db67ade778	prepare_expr: make preparing nonfrozen empty maps return null In Scylla and Cassandra inserting an empty collection that is not frozen, is interpreted as inserting a null value. list_prepare_expression and set_prepare_expression have an if which handles this behavior, but there wasn't one in map_prepare_expression. As a result preparing empty list or set would result in null, but preparing an empty map wouldn't. This is inconsistent, it's better to return null in all cases of empty nonfrozen collections. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 20:41:09 +01:00
Jan Ciolek	da71f9b50b	prepare_expr: fix a bug in map_prepare_expression map_prepare_expression takes a collection_constructor of unprepared items and prepares it. Elements of a map collection_constructor are tuples (key and value). map_prepare_expression creates a prepared collection_constructor by preparing each tuple and adding it to the result. During this preparation it needs to set the type of the tuple. There was a bug here - it took the type from unprepared tuple_constructor and assigned it to the prepared one. An unprepared tuple_constructor doesn't have a type so it ended up assigning nullptr. Instead of that it should create a tuple_type_impl instance by looking at the types of map key and values, and use this tuple_type_impl as the type of the prepared tuples. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 20:35:04 +01:00
Jan Ciolek	a656fdfe9a	expr_test: test preparing expr::collection_constructor for set Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 20:22:37 +01:00
Jan Ciolek	76f587cfe7	expr_test: test preparing expr::collection_constructor for list Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 20:22:37 +01:00
Jan Ciolek	44b55e6caf	expr_test: test preparing expr::tuple_constructor Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 20:22:37 +01:00
Jan Ciolek	265100a638	expr_test: test preparing expr::untyped_constant Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 20:22:37 +01:00
Jan Ciolek	f6b9100cd2	expr_test_utils: add make_bigint_raw/const Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 20:22:37 +01:00
Jan Ciolek	f9ff131f86	expr_test_utils: add make_tinyint_raw/const Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 20:22:36 +01:00
Jan Ciolek	76b6161386	expr_test: test preparing expr::bind_variable Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 20:22:36 +01:00
Jan Ciolek	4882724066	cql3: prepare_expr: forbid preparing bind_variable without a receiver prepare_expression treats receiver as an optional argument, it can be set to nullptr and the preparation should still succeed when it's possible to infer the type of an expression. preparing a bind_variable requires the receiver to be present, because it doesn't contain any information about the type of the bound value. Added a check that the receiver is present. Allowing to prepare a bind_variable without the receiver present was a bug. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 20:22:36 +01:00
Avi Kivity	2779a171fc	Merge 'Do not run aborted tasks' from Aleksandra Martyniuk task_manager::task::impl contains an abort source which can be used to check whether it is aborted and an abort method which aborts the task (request_abort on abort_source) and all its descendants recursively. When the start method is called after the task was aborted, then its state is set to failed and the task does not run. Fixes: #11995 Closes #11996 * github.com:scylladb/scylladb: tasks: do not run tasks that are aborted tasks: delete unused variable tasks: add abort_source to task_manager::task::impl	2022-11-17 19:42:46 +02:00
Pavel Emelyanov	a396c27efc	Merge 'message: messaging_service: fix topology_ignored for pending endpoints in get_rpc_client' from Kamil Braun `get_rpc_client` calculates a `topology_ignored` field when creating a client which says whether the client's endpoint had topology information when this client was created. This is later used to check if that client needs to be dropped and replaced with a new client which uses the correct topology information. The `topology_ignored` field was incorrectly calculated as `true` for pending endpoints even though we had topology information for them. This would lead to unnecessary drops of RPC clients later. Fix this. Remove the default parameter for `with_pending` from `topology::has_endpoint` to avoid similar bugs in the future. Apparently this fixes #11780. The verbs used by decommission operation use RPC client index 1 (see `do_get_rpc_client_idx` in message/messaging_service.cc). From local testing with additional logging I found that by the time this client is created (i.e. the first verb in this group is used), we already know the topology. The node is pending at that point - hence the bug would cause us to assume we don't know the topology, leading us to dropping the RPC client later, possibly in the middle of a decommission operation. Fixes: #11780 Closes #11942 * github.com:scylladb/scylladb: message: messaging_service: check for known topology before calling is_same_dc/rack test: reenable test_topology::test_decommission_node_add_column test/pylib: util: configurable period in wait_for message: messaging_service: fix topology_ignored for pending endpoints in get_rpc_client message: messaging_service: topology independent connection settings for GOSSIP verbs	2022-11-17 20:14:32 +03:00
Jan Ciolek	42e01cc67f	expr_test: test preparing expr::null Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 17:30:05 +01:00
Jan Ciolek	45b3fca71c	expr_test: test preparing expr::cast Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 17:30:05 +01:00
Jan Ciolek	498c9bfa0d	expr_test_utils: add make_receiver Add a convenience function which creates receivers. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 17:30:04 +01:00
Jan Ciolek	6873a21fbd	expr_test_utils: add make_smallint_raw/const Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 17:30:04 +01:00
Jan Ciolek	488056acb7	expr_test: test preparing expr::token Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 17:30:04 +01:00
Jan Ciolek	7958f77a40	expr_test: test preparing expr::subscript Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 17:30:04 +01:00
Jan Ciolek	569bd61c6c	expr_test: test preparing expr::column_value Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 17:30:04 +01:00
Jan Ciolek	26174e29c6	expr_test: test preparing expr::unresolved_identifier It's interesting that prepare_expression for column identifiers doesn't require a receiver. I hope this won't break validation in the future. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 17:30:04 +01:00
Jan Ciolek	c719a923bb	expr_test_utils: mock data_dictionary::database Add a function which creates a mock instance of data_dictionary::database. prepare_expression requires a data_dictionary::database as an argument, so unit tests for it need something to pass there. make_data_dictionary_database can be used to create an instance that is sufficient for tests. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-11-17 17:30:00 +01:00
Kamil Braun	8e8c32befe	test/pylib: manager_client: print used cluster when starting test case It will be easier to map server IDs to their IP addresses when browsing through the test logs.	2022-11-17 17:14:23 +01:00
Pavel Emelyanov	bc62ca46d4	lister: Make lister::dir_entry_types an enum_set This type is currently an unordered_set, but only consists of at most two elements. Making it an enum_set renders it into a size_t variable and better describes the intention. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-17 19:01:45 +03:00
Pavel Emelyanov	c6021b57a1	database: Avoid useless local variable It's used to run lister::scan_dir() with directory_entry_type::directory only, but for that is copied around on lambda captures. It's simpler just to use the value directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-17 19:00:49 +03:00
Kamil Braun	b83234d8aa	test/pylib: scylla_cluster: use server ID to name workdir and log file, not IP address Since recently the framework uses a separate set of unique IDs to identify servers, but the log file and workdir is still named using the last part of the IP address. This is confusing: the test logs sometimes don't provide the IP addr (only the ID), and even if they do, the reader of the test log may not know that they need to look at the last part of the IP to find the node's log/workdir. Also using ID will be necessary if we want to reuse IP addresses (e.g. during node replace, or simply not to run out of IP addresses during testing).	2022-11-17 16:55:12 +01:00
Anna Stuchlik	f7f03e38ee	doc: update the link to Enabling Experimental Features	2022-11-17 15:44:46 +01:00
Anna Stuchlik	02cea98f55	doc: remove the note referring to the previous ScyllaDB versions and add the relevant limitation to the paragraph	2022-11-17 15:05:00 +01:00
Anna Stuchlik	ce88c61785	doc: update the links to the Enabling Experimental Features section	2022-11-17 14:59:34 +01:00
Avi Kivity	76be6402ed	Merge 'repair: harden effective replication map' from Benny Halevy As described in #11993 per-shard repair_info instances get the effective_replication_map on their own with no centralized synchronization. This series ensures that the effective replication maps used by repair (and other associated structures like the token metadata and topology) are all in sync with the one used to initiate the repair operation. While at at, the series includes other cleanups in this area in repair and view that are not fixes as the calls happen in synchronous functions that do not yield. Fixes #11993 Closes #11994 * github.com:scylladb/scylladb: repair: pass erm down to get_hosts_participating_in_repair and get_neighbors repair: pass effective_replication_map down to repair_info repair: coroutinize sync_data_using_repair repair: futurize do_repair_start effective_replication_map: add global_effective_replication_map shared_token_metadata: get_lock is const repair: sync_data_using_repair: require to run on shard 0 repair: require all node operations to be called on shard 0 repair: repair_info: keep effective_replication_map repair: do_repair_start: use keyspace erm to get keyspace local ranges repair: do_repair_start: use keyspace erm for get_primary_ranges repair: do_repair_start: use keyspace erm for get_primary_ranges_within_dc repair: do_repair_start: check_in_shutdown first repair: get_db().local() where needed repair: get topology from erm/token_metdata_ptr view: get_view_natural_endpoint: get topology from erm	2022-11-17 13:29:02 +02:00
Konstantin Osipov	262566216b	raft: persist the initial raft address map	2022-11-17 14:26:36 +03:00
Konstantin Osipov	b35af73fdf	raft: (upgrade) do not use IP addresses from Raft config Always use raft address map to obtain the IP addresses of upgrade peers. Right now the map is populated from Raft configuration, so it's an equivalent transformation, but in the future raft address map will be populated from other sources: discovery and gossip, hence the logic of upgrade will change as well. Do not proceed with the upgrade if an address is missing from the map, since it means we failed to contact a raft member.	2022-11-17 14:26:31 +03:00
Pavel Emelyanov	2add9ba292	Merge 'Refactor topology out of token_metadata' from Benny Halevy This series moves the topology code from locator/token_metadata.{cc,hh} out to localtor/topology.{cc,hh} and introduces a shared header file: locator/types.hh contains shared, low level definitions, in anticipation of https://github.com/scylladb/scylladb/pull/11987 While at it, the token_metadata functions are turned into coroutines and topology copy constructor is deleted. The copy functionality is moved into an async `clone_gently` function that allows yielding while copying the topology. Closes #12001 * github.com:scylladb/scylladb: locator: refactor topology out of token_metadata locator: add types.hh topology: delete copy constructor token_metadata: coroutinize clone functions	2022-11-17 13:55:34 +03:00
Aleksandra Martyniuk	7ead1a7857	compaction: request abort only once in compaction_data::stop compaction_manager::task (and thus compaction_data) can be stopped because of many different reasons. Thus, abort can be requested more than once on compaction_data abort source causing a crash. To prevent this before each request_abort() we check whether an abort was requested before. Closes #12004	2022-11-17 12:44:59 +02:00
Benny Halevy	1e2741d2fe	abstract_replication_strategy: recognized_options: return unordered_set An unordered_set is more efficient and there is no need to return an ordered set for this purpose. This change facilitates a follow-up change of adding topology::get_datacenters(), returning an unordered_set of datacenter names. Refs #11987 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12003	2022-11-17 11:27:05 +02:00
Botond Dénes	e925c41f02	utils/gs/barrett.hh: aarch64: s/brarett/barrett/ Fix a typo introduced by the the recent patch fixing the spelling of Barrett. The patch introduced a typo in the aarch64 version of the code, which wasn't found by promotion, as that only builds on X86_64. Closes #12006	2022-11-17 11:09:59 +02:00
Konstantin Osipov	051dceeaff	raft: (and gossip) begin gossiping raft server ids We plan to use gossip data to educate Raft RPC about IP addresses of raft peers. Add raft server ids to application state, so that when we get a notification about a gossip peer we can identify which raft server id this notification is for, specifically, we can find what IP address stands for this server id, and, whenever the IP address changes, we can update Raft address map with the new address. On the same token, at boot time, we now have to start Gossip before Raft, since Raft won't be able to send any messages without gossip data about IP addresses.	2022-11-17 12:07:31 +03:00
Konstantin Osipov	990c7a209f	raft: change the API of conf change notifications Pass a change diff into the notification callback, rather than add or remove servers one by one, so that if we need to persist the state, we can do it once per configuration change, not for every added or removed server. For now still pass added and removed entries in two separate calls per a single configuration change. This is done mainly to fulfill the library contract that it never sends messages to servers outside the current configuration. The group0 RPC implementation doesn't need the two calls, since it simply marks the removed servers as expired: they are not removed immediately anyway, and messages can still be delivered to them. However, there may be test/mock implementations of RPC which could benefit from this contract, so we decided to keep it.	2022-11-17 12:07:31 +03:00
Benny Halevy	53fdf75cf9	repair: pass erm down to get_hosts_participating_in_repair and get_neighbors Now that it is available in repair_info. Fixes #11993 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:30 +02:00
Benny Halevy	b69be61f41	repair: pass effective_replication_map down to repair_info And make sure the token_metadata ring version is same as the reference one (from the erm on shard 0), when starting the repair on each shard. Refs #11993 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:29 +02:00
Benny Halevy	c47d36b53d	repair: coroutinize sync_data_using_repair Prepare for the next path that will co_await make_global_effective_replication_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:04 +02:00
Benny Halevy	58b1c17f5d	repair: futurize do_repair_start Turn it into a coroutine to prepare for the next path that will co_await make_global_effective_replication_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:04 +02:00
Benny Halevy	4b9269b7e2	effective_replication_map: add global_effective_replication_map Class to hold a coherent view of a keyspace effective replication map on all shards. To be used in a following patch to pass the sharded keyspace e_r_m:s to repair. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:01 +02:00
Avi Kivity	b8b78959fb	build: switch to packaged libdeflate rather than a submodule Now that our toolchain is based on Fedora 37, we can rely on its libdeflate rather than have to carry our own in a submodule. Frozen toolchain is regenerated. As a side effect clang is updated from 15.0.0 to 15.0.4. Closes #12000	2022-11-17 08:01:00 +02:00
Benny Halevy	2c677e294b	shared_token_metadata: get_lock is const The lock is acquired using an a function that doesn't modify the shared_token_metadata object. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	d6b2124903	repair: sync_data_using_repair: require to run on shard 0 And with that do_sync_data_using_repair can be folded into sync_data_using_repair. This will simplify using the effective_replication_map throughout the operation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	0c56c75cf8	repair: require all node operations to be called on shard 0 To simplify using of the effective_replication_map / token_metadata_ptr throught the operation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	64b0756adc	repair: repair_info: keep effective_replication_map Sampled when repair info is constructed. To be used throughout the repair process. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	c7d753cd44	repair: do_repair_start: use keyspace erm to get keyspace local ranges Rather than calling db.get_keyspace_local_ranges that looks up the keyspace and its erm again. We want all the inforamtion derived from the erm to be based on the same source. The function is synchronous so this changes doesn't fix anything, just cleans up the code. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	aaf74776c2	repair: do_repair_start: use keyspace erm for get_primary_ranges Ensure that the primary ranges are in sync with the keyspace erm. The function is synchronous so this change doesn't fix anything, it just cleans up the code. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	9200e6b005	repair: do_repair_start: use keyspace erm for get_primary_ranges_within_dc Ensure the erm and topology are in sync. The function is synchronous so this change doesn't fix anything, just cleans up the code. Fix mistake in comment while at it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:57:56 +02:00
Benny Halevy	59dc2567fd	repair: do_repair_start: check_in_shutdown first Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:56:34 +02:00
Benny Halevy	881eb0df83	repair: get_db().local() where needed In several places we get the sharded database using get_db() and then we only use db.local(). Simplify the code by keeping reference only to the local database upfront. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:56:34 +02:00
Benny Halevy	c22c4c8527	repair: get topology from erm/token_metdata_ptr We want the topology to be synchronized with the respective effective_replication_map / token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:56:34 +02:00
Benny Halevy	94f2e95a2f	view: get_view_natural_endpoint: get topology from erm Get the topology for the effective replication map rather than from the storage_proxy to ensure its synchronized with the natural endpoints. Since there's no preemption between the two calls currently there is no issue, so this is merely a clean up of the code and not supposed to fix anything. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:56:34 +02:00
Nadav Har'El	e393639114	test/cql-pytest: reproducer for crash in LWT with null key This patch adds a reproducer for issue #11954: Attempting an "IF NOT EXISTS" (LWT) write with a null key crashes Scylla, instead of producing a simple error message (like happens without the "IF NOT EXISTS" after #7852 was fixed). The test passed on Cassandra, but crashes Scylla. Because of this crash, we can't just mark the test "xfail" and it's temporarily marked "skip" instead. Refs #11954. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11982	2022-11-17 07:31:13 +02:00
Benny Halevy	d0bd305d16	locator: refactor topology out of token_metadata Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 21:55:54 +02:00
Benny Halevy	297a4de4e4	locator: add types.hh To export low-level types that are used by oher modules for the locator interfaces. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 21:53:05 +02:00
Kamil Braun	0c9cb5c5bf	Merge 'raft: wait for the next tick before retrying' from Gusev Petr When `modify_config` or `add_entry` is forwarded to the leader, it may reach the node at "inappropriate" time and result in an exception. There are two reasons for it - the leader is changing and, in case of `modify_config`, other `modify_config` is currently in progress. In both cases the command is retried, but before this patch there was no delay before retrying, which could led to a tight loop. The patch adds a new exception type `transient_error`. When the client receives it, it is obliged to retry the request after some delay. Previously leader-side exceptions were converted to `not_a_leader`, which is strange, especially for `conf_change_in_progress`. Fixes: #11564 Closes #11769 * github.com:scylladb/scylladb: raft: rafactor: remove duplicate code on retries delays raft: use wait_for_next_tick in read_barrier raft: wait for the next tick before retrying	2022-11-16 18:20:54 +01:00
Aleksandra Martyniuk	4250bd9458	tasks: do not run tasks that are aborted Currently in start() method a task is run even if it was already aborted. When start() is called on an aborted task, its state is set to task_manager::task_state::failed and it doesn't run.	2022-11-16 18:09:41 +01:00
Aleksandra Martyniuk	ebffca7ea5	tasks: delete unused variable	2022-11-16 18:07:57 +01:00
Aleksandra Martyniuk	752edc2205	tasks: add abort_source to task_manager::task::impl task_manager::task can be aborted with impl's abort_source. By default abort request is propagated to all task's descendants.	2022-11-16 18:07:11 +01:00
Avi Kivity	c4f069c6fc	Update seastar submodule * seastar 153223a188...4f4cc00660 (10): > Merge 'Avoid using namespace internal' from Pavel Emelyanov > Merge 'De-futurize IO class update calls' from Pavel Emelyanov > abort_source: subscribe(): remove noexcept qualifier > Merge 'Add Prometheus filtering capabilities by label' from Amnon Heiman > fsqual: stop causing memory leak error on LeakSanitizer > metrics.cc: Do not merge empty histogram > Update tutorial.md > README-DPDK.md: document --cflags option > build: install liburing.pc using stow > core/polymorphic_temporary_buffer: include <seastar/core/memory.hh> Closes #11991	2022-11-16 17:59:33 +02:00
Avi Kivity	3497891cf9	utils: spell "barrett" correctly As P. T. Barnoom famously said, "write what you like but spell my name correctly". Following that, we correct the spelling of Barrett's name in the source tree. Closes #11989	2022-11-16 16:30:38 +02:00
Benny Halevy	0c94ffcc85	topology: delete copy constructor Topology is copied only from token_metadata_impl::clone_only_token_map which copies the token_metadata_impl with yielding to prevent reactor stalls. This should apply to topology as well, so add a clone_gently function for cloning the topology from token_metadata_impl::clone_only_token_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 15:27:28 +02:00
Benny Halevy	4f4fc7fe22	token_metadata: coroutinize clone functions Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 15:27:28 +02:00
Kamil Braun	a83789160d	message: messaging_service: check for known topology before calling is_same_dc/rack `is_same_dc` and `is_same_rack` assume that the peer's topology is known. If it's unknown, `on_internal_error` will be called inside topology. When these functions are used in `get_rpc_client`, they are already protected by an earlier check for knowing the peer's topology (the `has_topology()` lambda). Another use is in `do_start_listen()`, where we create a filter for RPC module to check if it should accept incoming connections. If cross-dc or cross-rack encryption is enabled, we will reject connections attempts to the regular (non-ssl) port from other dcs/rack using `is_same_dc/rack`. However, it might happen that something (other Scylla node or otherwise) tries to contact us on the regular port and we don't know that thing's topology, which would result in `on_internal_error`. But this is not a fatal error; we simply want to reject that connection. So protect these calls as well. Finally, there's `get_preferred_ip` with an unprotected `is_same_dc` call which, for a given peer, may return a different IP from preferred IP cache if the endpoint resides in the same DC. If there is not entry in the preferred IP cache, we return the original (external) IP of the peer. We can do the same if we don't know the peer's topology. It's interesting that we didn't see this particular place blowing up. Perhaps the preferred IP cache is always populated after we know the topology.	2022-11-16 14:01:50 +01:00
Kamil Braun	9b2449d3ea	test: reenable test_topology::test_decommission_node_add_column Also improve the test to increase the probability of reproducing #11780 by injecting sleeps in appropriate places. Without the fix for #11780 from the earlier commit, the test reproduces the issue in roughly half of all runs in dev build on my laptop.	2022-11-16 14:01:50 +01:00
Kamil Braun	0f49813312	test/pylib: util: configurable period in wait_for	2022-11-16 14:01:50 +01:00
Kamil Braun	1bd2471c19	message: messaging_service: fix topology_ignored for pending endpoints in get_rpc_client `get_rpc_client` calculates a `topology_ignored` field when creating a client which says whether the client's endpoint had topology information when topology was created. This is later used to check if that client needs to be dropped and replaced with a new client which uses the correct topology information. The `topology_ignored` field was incorrectly calculated as `true` for pending endpoints even though we had topology information for them. This would lead to unnecessary drops of RPC clients later. Fix this. Remove the default parameter for `with_pending` from `topology::has_endpoint` to avoid similar bugs in the future. Apparently this fixes #11780. The verbs used by decommission operation use RPC client index 1 (see `do_get_rpc_client_idx` in message/messaging_service.cc). From local testing with additional logging I found that by the time this client is created (i.e. the first verb in this group is used), we already know the topology. The node is pending at that point - hence the bug would cause us to assume we don't know the topology, leading us to dropping the RPC client later, possibly in the middle of a decommission operation. Fixes: #11780	2022-11-16 14:01:50 +01:00
Kamil Braun	840be34b5f	message: messaging_service: topology independent connection settings for GOSSIP verbs The gossip verbs are used to learn about topology of other nodes. If inter-dc/rack encryption is enabled, the knowledge of topology is necessary to decide whether it's safe to send unencrypted messages to nodes (i.e., whether the destination lies in the same dc/rack). The logic in `messaging_service::get_rpc_client`, which decided whether a connection must be encrypted, was this (given that encryption is enabled): if the topology of the peer is known, and the peer is in the same dc/rack, don't encrypt. Otherwise encrypt. However, it may happen that node A knows node B's topology, but B doesn't know A's topology. A deduces that B is in the same DC and rack and tries sending B an unencrypted message. As the code currently stands, this would cause B to call `on_internal_error`. This is what I encountered when attempting to fix #11780. To guarantee that it's always possible to deliver gossiper verbs (even if one or both sides don't know each other's topology), and to simplify reasoning about the system in general, choose connection settings that are independent of the topology - for the connection used by gossiper verbs (other connections are still topology-dependent and use complex logic to handle the situation of unknown-and-later-known topology). This connection only contains 'rare' and 'cheap' verbs, so it's not a performance problem to always encrypt it (given that encryption is configured). And this is what already was happening in the past; it was at some point removed during topology knowledge management refactors. We just bring this logic back. Fixes #11992. Inspired by xemul/scylla@45d48f3d02.	2022-11-16 13:58:07 +01:00
Anna Stuchlik	01c9846bb6	doc: add the link to the Enabling Experimental Features section	2022-11-16 13:24:45 +01:00
Anna Stuchlik	f1b2f44aad	doc: move the TTL Alternator feature from the Experimental Features section to the production-ready section	2022-11-16 13:23:07 +01:00
Nadav Har'El	2f2f01b045	materialized views: fix view writes after base table schema change When we write to a materialized view, we need to know some information defined in the base table such as the columns in its schema. We have a "view_info" object that tracks each view and its base. This view_info object has a couple of mutable attributes which are used to lazily-calculate and cache the SELECT statement needed to read from the base table. If the base-table schema ever changes - and the code calls set_base_info() at that point - we need to forget this cached statement. If we don't (as before this patch), the SELECT will use the wrong schema and writes will no longer work. This patch also includes a reproducing test that failed before this patch, and passes afterwords. The test creates a base table with a view that has a non-trivial SELECT (it has a filter on one of the base-regular columns), makes a benign modification to the base table (just a silly addition of a comment), and then tries to write to the view - and before this patch it fails. Fixes #10026 Fixes #11542	2022-11-16 13:58:21 +02:00
Nadav Har'El	7cbb0b98bb	Merge 'doc: document user defined functions (UDFs)' from Anna Stuchlik This PR is V2 of the[ PR created by @psarna.](https://github.com/scylladb/scylladb/pull/11560). I have: - copied the content. - applied the suggestions left by @nyh. - made minor improvements, such as replacing "Scylla" with "ScyllaDB", fixing punctuation, and fixing the RST syntax. Fixes https://github.com/scylladb/scylladb/issues/11378 Closes #11984 * github.com:scylladb/scylladb: doc: label user-defined functions as Experimental doc: restore the note for the Count function (removed by mistatke) doc: document user defined functions (UDFs)	2022-11-16 13:09:47 +02:00
Botond Dénes	cbf9be9715	Merge 'Avoid 0.0.0.0 (and :0) as preferred IP' from Pavel Emelyanov Despite docs discourage from using INADDR_ANY as listen address, this is not disabled in code. Worse -- some snitch drivers may gossip it around as the INTERNAL_IP state. This set prevents this from happening and also adds a sanity check not to use this value if it somehow sneaks in. Closes #11846 * github.com:scylladb/scylladb: messaging_service: Deny putting INADD_ANY as preferred ip messaging_service: Toss preferred ip cache management gossiping_property_file_snitch: Dont gossip INADDR_ANY preferred IP gossiping_property_file_snitch: Make _listen_address optional	2022-11-16 08:30:42 +02:00
Avi Kivity	43d3e91e56	tools: toolchain: prepare: use real bash associative array When we translate from docker/go arch names to the kernel arch names, we use an associative array hack using computed variable names "{$!variable_name}". But it turns out bash has real associative arrays, introduced with "declare -A". Use the to make the code a little clearer. Closes #11985	2022-11-16 08:17:47 +02:00
Botond Dénes	e90d0811d0	Merge 'doc: update ScyllaDB requirements - supported CPUs and AWS i4g instances' from Anna Stuchlik Fix https://github.com/scylladb/scylla-docs/issues/4144 Closes #11226 * github.com:scylladb/scylladb: Update docs/getting-started/system-requirements.rst doc: specify the recommended AWS instance types doc: replace the tables with a generic description of support for Im4gn and Is4gen instances doc: add support for AWS i4g instances doc: extend the list of supported CPUs	2022-11-16 08:15:00 +02:00
Botond Dénes	bd1fcbc38f	Merge 'Introduce reverse vector_deserializer.' from Michał Radwański As indicated in #11816, we'd like to enable deserializing vectors in reverse. The forward deserialization is achieved by reading from an input_stream. The input stream internally is a singly linked list with complicated logic. In order to allow for going through it in reverse, instead when creating the reverse vector initializer, we scan the stream and store substreams to all the places that are a starting point for a next element. The iterator itself just deserializes elements from the remembered substreams, this time in reverse. Fixes #11816 Closes #11956 * github.com:scylladb/scylladb: test/boost/serialization_test.cc: add test for reverse vector deserializer serializer_impl.hh: add reverse vector serializer serializer_impl: remove unneeded generic parameter	2022-11-16 07:37:24 +02:00
Anna Stuchlik	cdb6557f23	doc: label user-defined functions as Experimental	2022-11-15 21:22:01 +01:00
Avi Kivity	d85f731478	build: update toolchain to Fedora 37 with clang 15 'cargo' instantiation now overrides internal git client with cli client due to unbounded memory usage [1]. [1] https://github.com/rust-lang/cargo/issues/10583#issuecomment-1129997984	2022-11-15 16:48:09 +00:00
Anna Stuchlik	1f1d88d04e	doc: restore the note for the Count function (removed by mistatke)	2022-11-15 17:41:22 +01:00
Anna Stuchlik	dbb19f55fb	doc: document user defined functions (UDFs)	2022-11-15 17:33:05 +01:00
Nadav Har'El	e4dba6a830	test/cql-pytest: add test for when MV requires IS NOT NULL As noted in issue #11979, Scylla inconsistently (and unlike Cassandra) requires "IS NOT NULL" one some but not all materialized-view key columns. Specifically, Scylla does not require "IS NOT NULL" on the base's partition key, while Cassandra does. This patch is a test which demonstrates this inconsistency. It currently passes on Cassandra and fails on Scylla, so is marked xfail. Refs #11979 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11980	2022-11-15 14:21:48 +01:00
Asias He	16bd9ec8b1	gossip: Improve get_live_token_owners and get_unreachable_token_owners The get_live_token_owners returns the nodes that are part of the ring and live. The get_unreachable_token_owners returns the nodes that are part of the ring and is not alive. The token_metadata::get_all_endpoints returns nodes that are part of the ring. The patch changes both functions to use the more authoritative source to get the nodes that are part of the ring and call is_alive to check if the node is up or down. So that the correctness does not depend on any derived information. This patch fixes a truncate issue in storage_proxy::truncate_blocking where it calls get_live_token_owners and get_unreachable_token_owners to decide the nodes to talk with for truncate operation. The truncate failed because incorrect nodes were returned. Fixes #10296 Fixes #11928 Closes #11952	2022-11-15 14:21:48 +01:00
Botond Dénes	21489c9f9c	Merge 'doc: add the "Scylladb Enterprise" label to the Enterprise-only features' from Anna Stuchlik This PR is a follow-up to https://github.com/scylladb/scylladb/pull/11918. With this PR: - The "ScyllaDB Enterprise" label is added to all the features that are only available in ScyllaDB Enterprise. - The previous Enterprise-only note is removed (it was included in multiple files as _/rst_include/enterprise-only-note.rst_ - this file is removed as it is no longer used anywhere in the docs). - "Scylla Enterprise" was removed from `versionadded `because now it's clear that the feature was added for Enterprise. Closes #11975 * github.com:scylladb/scylladb: doc: remove the enterprise-only-note.rst file, which was replaced by the ScyllaDB Enterprise label and is not used anymore doc: add the ScyllaDB Enterprise label to the descriptions of Enterprise-only features	2022-11-15 14:21:48 +01:00
Botond Dénes	34f29c8d67	Merge 'Use with_sstable_directory() helper in tests' from Pavel Emelyanov The helper is already widely used, one (last) test case can benefit from using it too Closes #11978 * github.com:scylladb/scylladb: test: Indentation fix after previous patch test: Wse with_sstable_directory() helper	2022-11-15 14:21:48 +01:00
Nadav Har'El	8a4ab87e44	Merge 'utils: crc: generate crc barrett fold tables at compile time' from Avi Kivity We use Barrett tables (misspelled in the code unfortunately) to fold crc computations of multiple buffers into a single crc. This is important because it turns out to be faster to compute crc of three different buffers in parallel rather than compute the crc of one large buffer, since the crc instruction has latency 3. Currently, we have a separate code generation step to compute the fold tables. The step generates a new C++ source files with the tables. But modern C++ allows us to do this computation at compile time, avoiding the code generation step. This simplifies the build. This series does that. There is some complication in that the code uses compiler intrinsics for the computation, and these are not constexpr friendly. So we first introduce constexpr-friendly alternatives and use them. To prove the transformation is correct, I compared the generated code from before the series and from just before the last step (where we use constexpr evaluation but still retain the generated file) and saw no difference in the values. Note that constexpr is not strictly needed - we could have run the code in the global variables' initializer. But that would cause a crash if we run on a pre-clmul machine, and is not as fun. Closes #11957 * github.com:scylladb/scylladb: test: crc: add unit tests for constexpr clmul and barrett fold utils: crc combine table: generate at compile time utils: barrett: inline functions in header utils: crc combine table: generate tables at compile time utils: crc combine table: extract table generation into a constexpr function utils: crc combine table: extract "pow table" code into constexpr function utils: crc combine table: store tables std::arrray rather than C array utils: barrett: make the barrett reduction constexpr friendly utils: clmul: add 64-bit constexpr clmul utils: barrett: extract barrett reduction constants utils: barrett: reorder functions utils: make clmul() constexpr	2022-11-15 14:21:48 +01:00
Petr Gusev	ae3e0e3627	raft: rafactor: remove duplicate code on retries delays Introduce a templated function do_on_leader_with_retries, use it in add_entries/modify_config/read_barrier. The function implements the basic logic of retries with aborts and leader changes handling, adds a delay between iterations to protect against tight loops.	2022-11-15 13:18:53 +04:00
Petr Gusev	15cc1667d0	raft: use wait_for_next_tick in read_barrier Replaced the yield on transport_error with wait_for_next_tick. Added delays for retries, similar to add_entry/modify_config: we postpone the next call attempt if we haven't received new information about the current leader.	2022-11-15 12:31:49 +04:00
Petr Gusev	5e15c3c9bd	raft: wait for the next tick before retrying When modify_config or add_entry is forwarded to the leader, it may reach the node at "inappropriate" time and result in an exception. There are two reasons for it - the leader is changing and, in case of modify_config, other modify_config is currently in progress. In both cases the command is retried, but before this patch there was no delay before retrying, which could led to a tight loop. The patch adds a new exception type transient_error. When the client node receives it, it is obliged to retry the request, possibly after some delay. Previously, leader-side exceptions were converted to not_a_leader exception, which is strange, especially for conf_change_in_progress. We add a delay before retrying in modify_config and add_entry if the client hasn't received any new information about the leader since the last attempt. This can happen if the server responds with a transient_error with an empty leader and the current node has not yet learned the new leader. We neglect an excessive delay if the newly elected leader is the same as the previous one, this supposed to be a rare. Fixes: #11564	2022-11-15 11:49:26 +04:00
Pavel Emelyanov	8dcd9d98d6	test: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-14 20:11:01 +03:00
Pavel Emelyanov	c9128e9791	test: Wse with_sstable_directory() helper It's already used everywhere, but one test case wires up the sstable_directory by hand. Fix it too, but keep in mind, that the caller fn stops the directory early. (indentation is deliberately left broken) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-11-14 20:11:01 +03:00
Michał Radwański	32c60b44c5	test/boost/serialization_test.cc: add test for reverse vector deserializer This test is just a copy-pasted version of forward serializer test.	2022-11-14 16:06:24 +01:00
Michał Radwański	dce67f42f8	serializer_impl.hh: add reverse vector serializer Currently when we want to deserialize mutation in reverse, we unfreeze it and consume from the end. This new reverse vector deserializer goes through input stream remembering substreams that contain a given output range member, and while traversing from the back, deserialize each substream.	2022-11-14 16:06:24 +01:00
Anna Stuchlik	e36bd208cc	doc: remove the enterprise-only-note.rst file, which was replaced by the ScyllaDB Enterprise label and is not used anymore	2022-11-14 15:20:51 +01:00
Anna Stuchlik	36324fe748	doc: add the ScyllaDB Enterprise label to the descriptions of Enterprise-only features	2022-11-14 15:16:51 +01:00
Takuya ASADA	da6c472db9	install.sh: Skip systemd existance check when --without-systemd When --without-systemd specified, install.sh should skip systemd existance check. Fixes #11898 Closes #11934	2022-11-14 14:07:46 +02:00
Benny Halevy	ff5527deb1	topology: copy _sort_by_proximity in copy constructor Fixes #11962 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11965	2022-11-14 13:59:56 +03:00
Pavel Emelyanov	bd48fdaad5	Merge 'handle_state_normal: do not update topology of removed endpoint' from Benny Halevy Currently, when replacing a node ip, keeping the old host, we might end up with the the old endpoint in system.peers if it is inserted back into the topology by `handle_state_normal` when on_join is called with the old endpoint. Then, later on, on_change sees that: ``` if (get_token_metadata().is_member(endpoint)) { co_await do_update_system_peers_table(endpoint, state, value); ``` As described in #11925. Fixes #11925 Closes #11930 * github.com:scylladb/scylladb: storage_service, system_keyspace: add debugging around system.peers update storage_service: handle_state_normal: update topology and notify_joined endpoint only if not removed	2022-11-14 13:58:28 +03:00
Botond Dénes	8e38551d93	Merge 'Allow each compaction group to have its own compaction backlog tracker' from Raphael "Raph" Carvalho Today, compaction_backlog_tracker is managed in each compaction_strategy implementation. So every compaction strategy is managing its own tracker and providing a reference to it through get_backlog_tracker(). But this prevents each group from having its own tracker, because there's only a single compaction_strategy instance per table. To remove this limitation, compaction_strategy impl will no longer manage trackers but will instead provide an interface for trackers to be created, such that each compaction_group will be allowed to create its own tracker and manage it by itself. Now table's backlog will be the sum of all compaction_group backlogs. The normalization factor is applied on the sum, so we don't have to adjust each individual backlog to any factor. Closes #11762 * github.com:scylladb/scylladb: replica: Allow one compaction_backlog_tracker for each compaction_group compaction: Make compaction_state available for compaction tasks being stopped compaction: Implement move assignment for compaction_backlog_tracker compaction: Fix compaction_backlog_tracker move ctor compaction: Use table_state's backlog tracker in compaction_read_monitor_generator compaction: kill undefined get_unimplemented_backlog_tracker() replica: Refactor table::set_compaction_strategy for multiple groups Fix exception safety when transferring ongoing charges to new backlog tracker replica: move_sstables_from_staging: Use tracker from group owning the SSTable replica: Move table::backlog_tracker_adjust_charges() to compaction_group replica: table::discard_sstables: Use compaction_group's backlog tracker replica: Disable backlog tracker in compaction_group::stop() replica: database_sstable_write_monitor: use compaction_group's backlog tracker replica: Move table::do_add_sstable() to compaction_group test/sstable_compaction_test: Switch to table_state::get_backlog_tracker() compaction/table_state: Introduce get_backlog_tracker()	2022-11-14 07:05:28 +02:00
Avi Kivity	b8cb34b928	test: crc: add unit tests for constexpr clmul and barrett fold Check that the constexpr variants indeed match the runtime variants. I verified manually that exactly one computation in each test is executed at run time (and is compared against a constant).	2022-11-13 16:22:29 +02:00
Avi Kivity	70217b5109	utils: crc combine table: generate at compile time By now the crc combine tables are generated at compile time, but still in a separate code generation step. We now eliminate the code generation step and instead link the global variables directly into the main executable. The global variables have been conveniently named exactly as the code generation step names them, so we don't need to touch any users.	2022-11-12 17:26:45 +02:00
Avi Kivity	164e991181	utils: barrett: inline functions in header Avoid duplicate definitions if the same header is used from more than one place, at it will soon be.	2022-11-12 17:26:08 +02:00
Avi Kivity	a4f06773da	utils: crc combine table: generate tables at compile time Move the tables into global constinit variables that are generated at compile time. Note the code that creates the generated crc32_combine_table.cc is still called; it transorms compile-time generated tables into a C++ source that contains the same values, as literals. If we generate a diff between gen/utils/gz/crc_combine_table.cc before this series and after this patch, we see the only change in the file is the type of the variable (which changed to std::array), proving our constexpr code is correct.	2022-11-12 17:16:59 +02:00
Avi Kivity	a229fdc41e	utils: crc combine table: extract table generation into a constexpr function Move the code to a constexpr function, so we can later generate the tables at compile time. Note that although the function is constexpr, it is still evaluated at runtime, since the calling function (main()) isn't constexpr itself.	2022-11-12 17:13:52 +02:00
Avi Kivity	d42bec59bb	utils: crc combine table: extract "pow table" code into constexpr function A "pow table" is used to generate the Barrett fold tables. Extract its code into a constexpr function so we can later generate the fold tables at compile time.	2022-11-12 17:11:44 +02:00
Avi Kivity	6e34014b64	utils: crc combine table: store tables std::arrray rather than C array C arrays cannot be returned from functions and therefore aren't suitable for constexpr processing. std::array<> is a regular value and so is constexpr friendly.	2022-11-12 17:09:02 +02:00
Avi Kivity	1e9252f79a	utils: barrett: make the barrett reduction constexpr friendly Dispatch to intrinsics or constexpr based on evaluation context.	2022-11-12 17:04:44 +02:00
Avi Kivity	0bd90b5465	utils: clmul: add 64-bit constexpr clmul This is used when generating the Barrett reduction tables, and also when applying the Barrett reduction at runtime, so we need it to be constexpr friendly.	2022-11-12 17:04:05 +02:00
Avi Kivity	c376c539b8	utils: barrett: extract barrett reduction constants The constants are repeated across x86_64 and aarch64, so extract them into a common definition.	2022-11-12 17:00:17 +02:00
Avi Kivity	2fdf81af7b	utils: barrett: reorder functions Reorder functions in dependency order rather than forward declaring them. This makes them more constexpr-friendly.	2022-11-12 16:52:41 +02:00
Avi Kivity	8aa59a897e	utils: make clmul() constexpr clmul() is a pure function and so should already be constexpr, but it uses intrinsics that aren't defined as constexpr and so the compiler can't really compute it at compile time. Fix by defining a constexpr variant and dispatching based on whether we're being constant-evaluated or not. The implementation is simple, but in any case proof that it is correct will be provided later on.	2022-11-12 16:49:43 +02:00
Raphael S. Carvalho	b88acffd66	replica: Allow one compaction_backlog_tracker for each compaction_group Today, compaction_backlog_tracker is managed in each compaction_strategy implementation. So every compaction strategy is managing its own tracker and providing a reference to it through get_backlog_tracker(). But this prevents each group from having its own tracker, because there's only a single compaction_strategy instance per table. To remove this limitation, compaction_strategy impl will no longer manage trackers but will instead provide an interface for trackers to be created, such that each compaction group will be allowed to have its own tracker, which will be managed by compaction manager. On compaction strategy change, table will update each group with the new tracker, which is created using the previously introduced ompaction_group_sstable_set_updater. Now table's backlog will be the sum of all compaction_group backlogs. The normalization factor is applied on the sum, so we don't have to adjust each individual backlog to any factor. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:22:51 -03:00
Raphael S. Carvalho	d862dd815c	compaction: Make compaction_state available for compaction tasks being stopped compaction_backlog_tracker will be managed by compaction_manager, in the per table state. As compaction tasks can access the tracker throughout its lifetime, remove() can only deregister the state once we're done stopping all tasks which map to that state. remove() extracted the state upfront, then performed the stop, to prevent new tasks from being registered and left behind. But we can avoid the leak of new tasks by only closing the gate, which waits for all tasks (which are stopped a step earlier) and once closed, prevents new tasks from being registered. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:22:51 -03:00
Raphael S. Carvalho	0a152a2670	compaction: Implement move assignment for compaction_backlog_tracker That's needed for std::optional to work on its behalf. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:22:49 -03:00
Raphael S. Carvalho	fe305cefd0	compaction: Fix compaction_backlog_tracker move ctor Luckily it's not used anywhere. Default move ctor was picked but it won't clear _manager of old object, meaning that its destructor will incorrectly deregister the tracker from compaction_backlog_manager. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:37 -03:00
Raphael S. Carvalho	8e1e30842d	compaction: Use table_state's backlog tracker in compaction_read_monitor_generator A step closer towards a separate backlog tracker for each compaction group. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:37 -03:00
Raphael S. Carvalho	fedafd76eb	compaction: kill undefined get_unimplemented_backlog_tracker() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:37 -03:00
Raphael S. Carvalho	90991bda69	replica: Refactor table::set_compaction_strategy for multiple groups Refactoring the function for it to accomodate multiple compaction groups. To still provide strong exception guarantees, preparation and execution of changes will be separated. Once multiple groups are supported, each group will be prepared first, and the noexcept execution will be done as a last step. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:37 -03:00
Raphael S. Carvalho	244efddb22	Fix exception safety when transferring ongoing charges to new backlog tracker When setting a new strategy, the charges of old tracker is transferred to the new one. The problem is that we're not reverting changes if exception is triggered before the new strategy is successfully set. To fix this exception safety issue, let's copy the charges instead of moving them. If exception is triggered, the old tracker is still the one used and remain intact. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:37 -03:00
Raphael S. Carvalho	d1e2dbc592	replica: move_sstables_from_staging: Use tracker from group owning the SSTable When moving SSTables from staging directory, we'll conditionally add them to backlog tracker. As each group has its own tracker, a given sstable will be added to the tracker of the group that owns it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:37 -03:00
Raphael S. Carvalho	9031dc3199	replica: Move table::backlog_tracker_adjust_charges() to compaction_group Procedures that call this function happen to be in compaction_group, so let's move it to group. Simplifies the change where the procedure retrieves tracker from the group itself. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:36 -03:00
Raphael S. Carvalho	116459b69e	replica: table::discard_sstables: Use compaction_group's backlog tracker Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:36 -03:00
Raphael S. Carvalho	b2d8545b15	replica: Disable backlog tracker in compaction_group::stop() As we're moving backlog tracker to compaction group, we need to stop the tracker there too. We're moving it a step earlier in table::stop(), before sstables are cleared, but that's okay because it's still done after the group was deregistered from compaction manager, meaning no compactions are running. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:36 -03:00
Raphael S. Carvalho	91b0d772e2	replica: database_sstable_write_monitor: use compaction_group's backlog tracker Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:36 -03:00
Raphael S. Carvalho	f37a05b559	replica: Move table::do_add_sstable() to compaction_group All callers of do_add_sstable() live in compaction_group, so it should be moved into compaction_group too. It also makes easier for the function to retrieve the backlog tracker from the group. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:36 -03:00
Raphael S. Carvalho	835927a2ad	test/sstable_compaction_test: Switch to table_state::get_backlog_tracker() Important for decoupling backlog tracker from table's compaction strategy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:36 -03:00
Raphael S. Carvalho	1ec0ef18a5	compaction/table_state: Introduce get_backlog_tracker() This interface will be helpful for allowing replica::table, unit tests and sstables::compaction to access the compaction group's tracker which will be managed by the compaction manager, once we complete the decoupling work. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:36 -03:00
Nadav Har'El	ff87624fb4	test/cql-pytest: add another regression test for reversed-type bug In commit `544ef2caf3` we fixed a bug where a reveresed clustering-key order caused problems using a secondary index because of incorrect type comparison. That commit also included a regression test for this fix. However, that fix was incomplete, and improved later in commit `c8653d1321`. That later fix was labeled "better safe than sorry", and did not include a test demonstrating any actual bug, so unsurprisingly we never backported that second fix to any older branches. Recently we discovered that missing the second patch does cause real problems, and this patch includes a test which fails when the first patch is in, but the second patch isn't (and passes when both patches are in, and also passes on Cassandra). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11943	2022-11-11 11:01:22 +02:00
Botond Dénes	302917f63d	mutation_compactor: add validator The mutation compactor is used on most read-paths we have, so adding a validator to it gives us a good coverage, in particular it gives us full coverage of queries and compaction. The validator validates mutation token (and mutation fragment kind) monotonicity as that is quite cheap, while it is enough to catch the most common problems. As we already have a validator on the compaction path (in the sstable writer), the validator is disabled when the mutation compactor is instantiated for compaction. We should probably make this configurable at some point. The addition of this validator should prevent the worst of the fragment reordering bugs to affect reads.	2022-11-11 10:26:05 +02:00
Botond Dénes	5c245b4a5e	mutation_fragment_stream_validator: add a 'none' validation level Which, as its name suggests, makes the validating filter not validate anything at all. This validation level can be used effectively to make it so as if the validator was not there at all.	2022-11-11 09:58:44 +02:00
Botond Dénes	a4b58f5261	test/boost/mutation_query_test: test_partition_limit: sort input data The test's input data is currently out-of-order, violating a fundamental invariant of data always being sorted. This doesn't cause any problems right now, but soon it will. Sort it to avoid it.	2022-11-11 09:58:44 +02:00
Botond Dénes	2c551bb7ce	querier: consume_page(): use partition_start as the sentinel value Said method calls `compact_mutation_state::start_new_page()` which requires the kind of the next fragment in the reader. When there is no fragment (reader is at EOS), we use partition-end. This was a poor choice: if the reader is at EOS, partition-kind was the last fragment kind, if the stream were to continue the next fragment would be a partition-start.	2022-11-11 09:58:18 +02:00
Botond Dénes	0bcfc9d522	treewide: use ::for_partition_end() instead of ::end_of_partition_tag_t{} We just added a convenience static factory method for partition end, change the present users of the clunky constructor+tag to use it instead.	2022-11-11 09:58:18 +02:00
Botond Dénes	f1a039fc2b	treewide: use ::for_partition_start() instead of ::partition_start_tag_t{} We just added a convenience static factory method for partition start, change the present users of the clunky constructor+tag to use it instead.	2022-11-11 09:58:18 +02:00
Botond Dénes	6a002953e9	position_in_partition: add for_partition_{start,end}()	2022-11-11 09:58:18 +02:00
Kamil Braun	4a2ec888d5	Merge 'test.py: use internal id to manage servers' from Alecco Instead of using assigned IP addresses, use a local integer ID for managing servers. IP address can be reused by a different server. While there, get host ID (UUID). This can also be reused with `node replace` so it's not good enough for tracking. Closes #11747 * github.com:scylladb/scylladb: test.py: use internal id to manage servers test.py: rename hostname to ip_addr test.py: get host id test.py: use REST api client in ScyllaCluster test.py: remove unnecessary reference to web app test.py: requests without aiohttp ClientSession	2022-11-10 17:12:16 +01:00
Kamil Braun	1cc68b262e	docs: describe the Raft upgrade and recovery procedures In the 5.1 -> 5.2 upgrade doc, include additional steps for enabling Raft using the `consistent_cluster_management` flag. Note that we don't have this flag yet but it's planned to replace the experimental flag in 5.2. In the "Raft in ScyllaDB" document, add sections about: - enabling Raft in existing clusters in Scylla 5.2, - verifying that the internal Raft upgrade procedure finishes successfully, - recovering from a stuck Raft upgrade procedure or from a majority loss situation. Fix some problems in the documentation, e.g. it is not possible to enable Raft in an existing cluster in 5.0, but the documentation claimed that it is. Follow-up items: - if we decide for a different name for `consistent_cluster_management`, use that name in the docs instead - update the warnings in Scylla to link to the Raft doc - mention Enterprise versions once we know the numbers - update the appropriate upgrade docs for Enterprise versions once they exist	2022-11-10 17:08:57 +01:00
Kamil Braun	3dab07ec11	docs: add upgrade guide 5.1 -> 5.2 It's a copy-paste from the 5.0 -> 5.1 guide with substitutions: s/5.1/5.2, s/5.0/5.1 The metric update guide is not written, I left a TODO. Also I didn't include the guide in docs/upgrade/upgrade-opensource/index.rst, since 5.2 is not released yet. The guide can be accessed by manually following the link: /upgrade/upgrade-opensource/upgrade-guide-from-5.1-to-5.2/	2022-11-10 16:49:14 +01:00
Alejo Sanchez	700054abee	test.py: use internal id to manage servers Instead of using assigned IP addresses, use an internal server id. Define types to distinguish local server id, host ID (UUID), and IP address. This is needed to test servers changing IP address and for node replace (host UUID). Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-11-10 09:14:37 +01:00
Alejo Sanchez	1e38f5478c	test.py: rename hostname to ip_addr The code explicitly manages an IP as string, make it explicit in the variable name. Define its type and test for set in the instance instead of using an empty string as placeholder. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-11-10 09:14:37 +01:00
Alejo Sanchez	f478eb52a3	test.py: get host id When initializing a ScyllaServer, try to get the host id instead of only checking the REST API is up. Use the existing aiohttp session from ScyllaCluster. In case of HTTP error check the status was not an internal error (500+). Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-11-10 09:14:37 +01:00
Alejo Sanchez	78663dda72	test.py: use REST api client in ScyllaCluster Move the REST api client to ScyllaCluster. This will allow the cluster to query its own servers. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-11-10 09:14:37 +01:00
Alejo Sanchez	75ea345611	test.py: remove unnecessary reference to web app The aiohttp.web.Application only needs to be passed, so don't store a reference in ScyllaCluster object. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-11-10 09:14:37 +01:00
Alejo Sanchez	a5316b0c6b	test.py: requests without aiohttp ClientSession Simplify REST helper by doing requests without a session. Reusing an aiohttp.ClientSession causes knock-on effects on `rest_api/test_task_manager` due to handling exceptions outside of an async with block. Requests for cluster management and Scylla REST API don't need session, anyway. Raise HTTPError with status code, text reason, params, and json. In ScyllaCluster.install_and_start() instead of adding one more custom exception, just catch all exceptions as they will be re-raised later. While there avoid code duplication and improve sanity, type checking, and lint score. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-11-10 09:14:37 +01:00
Botond Dénes	21bc37603a	Merge 'utils: config_src: add set_value_on_all_shards functions' from Benny Halevy Currently when we set a single value we need to call broadcast_to_all_shards to let observers on all shards get notified of the new value. However, the latter broadcasts all value to all shards so it's terribly inefficient. Instead, add async set_value_on_all_shards functions to broadcast a value to all shards. Use those in system_keyspace for db_config_table virtual table and in task_manager_test to update the task_manager ttl. Refs #7316 Closes #11893 * github.com:scylladb/scylladb: tests: check ttl on different shards utils: config_src: add set_value_on_all_shards functions utils: config_file: add config_source::API	2022-11-10 07:16:39 +02:00
Botond Dénes	3aff59f189	Merge 'staging sstables: filter tokens for view update generation' from Benny Halevy This mini-series introduces dht::tokens_filter and uses it for consuming staging sstable in the view_update_generator. The tokens_filter uses the token ranges owned by the current node, as retrieved by get_keyspace_local_ranges. Refs #9559 Closes #11932 * github.com:scylladb/scylladb: db: view_update_generator: always clean up staging sstables compaction: extract incremental_owned_ranges_checker out to dht	2022-11-10 07:00:51 +02:00
Avi Kivity	9b6ab5db4a	Update seastar submodule * seastar e0dabb361f...153223a188 (8): > build: compile dpdk with -fpie (position independent executable) > Merge 'io_request: remove ctor overloads of io_request and s/io_request/const io_request/' from Kefu Chai > iostream: remove unused function > smp: destroy_smp_service_group: verify smp_service_group id > core/circular_buffer: refactor loop in circular_buffer::erase() > Merge 'Outline reactor::add_task() and sanitize reactor::shuffle() methods' from Pavel Emelyanov > Add NOLINT for cert-err58-cpp > tests: Fix false-positive use-after-free detection Closes #11940	2022-11-09 23:36:50 +02:00
Aleksandra Martyniuk	b0ed4d1f0f	tests: check ttl on different shards Test checking if ttl is properly set is extended to check whether the ttl value is changed on non-zero shard.	2022-11-09 16:58:46 +02:00
Botond Dénes	725e5b119d	Revert "replica: Pick new generation for SSTables being moved from staging dir" This reverts commit `ba6186a47f`. Said commit violates the widely held assumption that sstables generations can be used as sstable identity. One known problem caused this is potential OOO partition emitted when reading from sstables (#11843). We now also have a better fix for #11789 (the bug this commit was meant to fix): `4aa0b16852`. So we can revert without regressions. Fixes: #11843 Closes #11886	2022-11-09 16:35:31 +02:00
Eliran Sinvani	ab7429b77d	cql: Fix crash upon use of the word empty for service level name Wrong access to an uninitialized token instead of the actual generated string caused the parser to crash, this wasn't detected by the ANTLR3 compiler because all the temporary variables defined in the ANTLR3 statements are global in the generated code. This essentialy caused a null dereference. Tests: 1. The fixed issue scenario from github. 2. Unit tests in release mode. Fixes #11774 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20190612133151.20609-1-eliransin@scylladb.com> Closes #11777	2022-11-09 15:58:57 +02:00
Anna Stuchlik	d2e54f7097	Merge branch 'master' into anna-requirements-arm-aws	2022-11-09 14:39:00 +01:00
Anna Stuchlik	8375304d9b	Update docs/getting-started/system-requirements.rst Co-authored-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2022-11-09 14:37:34 +01:00
Benny Halevy	38d8777d42	storage_service, system_keyspace: add debugging around system.peers update Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-09 14:45:47 +02:00
Benny Halevy	5401b6055c	storage_service: handle_state_normal: update topology and notify_joined endpoint only if not removed Currently, when replacing a node ip, keeping the old host, we might end up with the the old endpoint in system.peers if it is inserted back into the topology by `handle_state_normal` when on_join is called with the old endpoint. Then, later on, on_change sees that: ``` if (get_token_metadata().is_member(endpoint)) { co_await do_update_system_peers_table(endpoint, state, value); ``` As described in #11925. Fixes #11925 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-09 14:45:22 +02:00
Benny Halevy	1a183047c0	utils: config_src: add set_value_on_all_shards functions Currently when we set a single value we need to call broadcast_to_all_shards to let observers on all shards get notified of the new value. However, the latter broadcasts all value to all shards so it's terribly inefficient. Instead, add async set_value_on_all_shards functions to broadcast a value to all shards. Use those in system_keyspace for db_config_table virtual table and in task_manager_test to update the task_manager ttl. Refs #7316 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-09 11:55:14 +02:00
Benny Halevy	e83f42ec70	utils: config_file: add config_source::API For task_manager test api. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-09 11:53:20 +02:00
Botond Dénes	94db2123b9	Update tools/java submodule * tools/java 583261fc0e...caf754f243 (1): > build: remove JavaScript snippets in ant build file	2022-11-09 07:59:04 +02:00
Benny Halevy	10f8f13b90	db: view_update_generator: always clean up staging sstables Since they are currently not cleaned up by cleanup compaction filter their tokens, processing only tokens owned by the current node (based on the keyspace replication strategy). Refs #9559 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-09 07:38:22 +02:00
Benny Halevy	fd3e66b0cc	compaction: extract incremental_owned_ranges_checker out to dht It is currently used by cleanup_compaction partition filter. Factor it out so it can be used to filter staging sstables in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-09 07:32:56 +02:00
Gleb Natapov' via ScyllaDB development	2100a8f4ca	service: raft: demote configuration change error to warning since it is retried anyway Message-Id: <Y2ohbFtljmd5MNw0@scylladb.com>	2022-11-09 00:09:39 +01:00
Avi Kivity	04ecf4ee18	Update tools/java submodule (cassandra-stress fails with node down) * tools/java 87672be28e...583261fc0e (1): > cassandra-stress: pass all hosts stright to the driver	2022-11-08 14:58:14 +02:00
Botond Dénes	7f69cccbdf	scylla-gdb.py: $downcast_vptr(): add multiple inheritance support When a class inherits from multiple virtual base classes, pointers to instances of this class via one of its base classes, might point to somewhere into the object, not at its beginning. Therefore, the simple method employed currently by $downcast_vptr() of casting the provided pointer to the type extracted from the vtable name fails. Instead when this situation is detected (detectable by observing that the symbol name of the partial vtable is not to an offset of +16, but larger), $downcast_vptr() will iterate over the base classes, adjusting the pointer with their offsets, hoping to find the true start of the object. In the one instance I tested this with, this method worked well. At the very least, the method will now yield a null pointer when it fails, instead of a badly casted object with corrupt content (which the developer might or might not attribute to the bad cast). Closes #11892	2022-11-08 14:51:26 +02:00
Michał Chojnowski	3e0c7a6e9f	test: sstable_datafile_test: eliminate a use of std::regex to prevent stack overflow This usage of std::regex overflows the seastar::thread stack size (128 KiB), causing memory corruption. Fix that. Closes #11911	2022-11-08 14:41:34 +02:00
Botond Dénes	2037d7f9cd	Merge 'doc: add the "ScyllaDB Enterprise" label to highlight the Enterprise-only features' from Anna Stuchlik This PR adds the "ScyllaDB Enterprise" label to highlight the Enterprise-only features on the following pages: - Encryption at Rest - the label indicates that the entire page is about an Enterprise-only feature. - Compaction - the labels indicate the sections that are Enterprise-only. There are more occurrences across the docs that require a similar update. I'll update them in another PR if this PR is approved. Closes #11918 * github.com:scylladb/scylladb: doc: fix the links to resolve the warnings doc: add the Enterprise label on the Compaction page (to a subheading and on a list of strategies) to replace the info box doc: add the Enterprise label to the Encryption at Rest page (the entire page) to replace the info box	2022-11-08 09:53:48 +02:00
Raphael S. Carvalho	a57724e711	Make off-strategy compaction wait for view building completion Prior to off-strategy compaction, streaming / repair would place staging files into main sstable set, and wait for view building completion before they could be selected for regular compaction. The reason for that is that view building relies on table providing a mutation source without data in staging files. Had regular compaction mixed staging data with non-staging one, table would have a hard time providing the required mutation source. After off-strategy compaction, staging files can be compacted in parallel to view building. If off-strategy completes first, it will place the output into the main sstable set. So a parallel view building (on sstables used for off-strategy) may potentially get a mutation source containing staging data from the off-strategy output. That will mislead view builder as it won't be able to detect changes to data in main directory. To fix it, we'll do what we did before. Filter out staging files from compaction, and trigger the operation only after we're done with view building. We're piggybacking on off-strategy timer for still allowing the off-strategy to only run at the end of the node operation, to reduce the amount of compaction rounds on the data introduced by repair / streaming. Fixes #11882. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #11919	2022-11-08 08:53:58 +02:00
Botond Dénes	243fcb96f0	Update tools/python3 submodule * tools/python3 bf6e892...773070e (1): > create-relocatable-package: harden against missing files	2022-11-08 08:43:30 +02:00
Avi Kivity	46690bcb32	build: harden create-relocatable-package.py against changes in libthread-db.so name create-relocatable-package.py collects shared libraries used by executables for packaging. It also adds libthread-db.so to make debugging possible. However, the name it uses has changed in glibc, so packaging fails in Fedora 37. Switch to the version-agnostic names, libthread-db.so. This happens to be a symlink, so resolve it. Closes #11917	2022-11-08 08:41:22 +02:00
Takuya ASADA	acc408c976	scylla_setup: fix incorrect type definition on --online-discard option --online-discard option defined as string parameter since it doesn't specify "action=", but has default value in boolean (default=True). It breaks "provisioning in a similar environment" since the code supposed boolean value should be "action='store_true'" but it's not. We should change the type of the option to int, and also specify "choices=[0, 1]" just like --io-setup does. Fixes #11700 Closes #11831	2022-11-08 08:40:44 +02:00
Avi Kivity	3d345609d8	config: disable "mc" format sstables for new data "md" format was introduced in 4.3, in `3530e80ce1`, two years ago. Disable the option to create new sstables with the "mc" format. Closes #11265	2022-11-08 08:36:27 +02:00
Anna Stuchlik	0eaafced9d	doc: fix the links to resolve the warnings	2022-11-07 19:15:21 +01:00
Anna Stuchlik	b57e0cfb7c	doc: add the Enterprise label on the Compaction page (to a subheading and on a list of strategies) to replace the info box	2022-11-07 18:54:35 +01:00
Anna Stuchlik	9f3fcb3fa0	doc: add the Enterprise label to the Encryption at Rest page (the entire page) to replace the info box	2022-11-07 18:48:37 +01:00
Tomasz Grabiec	a9063f9582	Merge 'service/raft: failure detector: ping `raft::server_id`s, not `gms::inet_address`es' from Kamil Braun Whenever a Raft configuration change is performed, `raft::server` calls `raft_rpc::add_server`/`raft_rpc::remove_server`. Our `raft_rpc` implementation has a function, `_on_server_update`, passed in the constructor, which it called in `add_server`/`remove_server`; that function would update the set of endpoints detected by the direct failure detector. `_on_server_update` was passed an IP address and that address was added to / removed from the failure detector set (there's another translation layer between the IP addresses and internal failure detector 'endpoint ID's; but we can ignore it for the purposes of this commit). Therefore: the failure detector was pinging a certain set of IP addresses. These IP addresses were updated during Raft configuration changes. To implement the `is_alive(raft::server_id)` function (required by `raft::failure_detector` interface), we would translate the ID using the Raft address map, which is currently also updated during configuration changes, to an IP address, and check if that IP address is alive according to the direct failure detector (which maintained an `_alive_set` of type `unordered_set<gms::inet_address>`). This all works well but it assumes that servers can be identified using IP addresses - it doesn't play well with the fact that servers may change their IP addresses. The only immutable identifier we have for a server is `raft::server_id`. In the future, Raft configurations will not associate IP addresses with Raft servers; instead we will assume that IP addresses can change at any time, and there will be a different mechanism that eventually updates the Raft address map with the latest IP address for each `raft::server_id`. To prepare us for that future, in this commit we no longer operate in terms of IP addresses in the failure detector, but in terms of `raft::server_id`s. Most of the commit is boilerplate, changing `gms::inet_address` to `raft::server_id` and function/variable names. The interesting changes are: - in `is_alive`, we no longer need to translate the `raft::server_id` to an IP address, because now the stored `_alive_set` already contains `raft::server_id`s instead of `gms::inet_address`es. - the `ping` function now takes a `raft::server_id` instead of `gms::inet_address`. To send the ping message, we need to translate this to IP address; we do it by the `raft_address_map` pointer introduced in an earlier commit. Thus, there is still a point where we have to translate between `raft::server_id` and `gms::inet_address`; but observe we now do it at the last possible moment - just before sending the message. If we have no translation, we consider the `ping` to have failed - it's equivalent to a network failure where no route to a given address was found. Closes #11759 * github.com:scylladb/scylladb: direct_failure_detector: get rid of complex `endpoint_id` translations service/raft: ping `raft::server_id`s, not `gms::inet_address`es service/raft: store `raft_address_map` reference in `direct_fd_pinger` gms: gossiper: move `direct_fd_pinger` out to a separate service gms: gossiper: direct_fd_pinger: extract generation number caching to a separate class	2022-11-07 16:42:35 +01:00
Botond Dénes	2b572d94f5	Merge 'doc: improve the documentation landing page ' from Anna Stuchlik This PR introduces the following changes to the documentation landing page: - The " New to ScyllaDB? Start here!" box is added. - The "Connect your application to Scylla" box is removed. - Some wording has been improved. - "Scylla" has been replaced with "ScyllaDB". Closes #11896 * github.com:scylladb/scylladb: Update docs/index.rst doc: replace Scylla with ScyllaDB on the landing page doc: improve the wording on the landing page doc: add the link to the ScyllaDB Basics page to the documentation landing page	2022-11-07 16:18:59 +02:00
Avi Kivity	91f2cd5ac4	test: lib: exception_predicate: use boost::regex instead of std::regex std::regex was observed to overflow stack on aarch64 in debug mode. Use boost::regex until the libstdc++ bug[1] is fixed. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 Closes #11888	2022-11-07 14:03:25 +02:00
Kamil Braun	0c7ff0d2cb	docs: a single 5.0 -> 5.1 upgrade guide There were 4 different pages for upgrading Scylla 5.0 to 5.1 (and the same is true for other version pairs, but I digress) for different environments: - "ScyllaDB Image for EC2, GCP, and Azure" - Ubuntu - Debian - RHEL/CentOS THe Ubuntu and Debian pages used a common template: ``` .. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p1.rst .. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p2.rst ``` with different variable substitutions. The "Image" page used a similar template, with some extra content in the middle: ``` .. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p1.rst .. include:: /upgrade/_common/upgrade-image-opensource.rst .. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p2.rst ``` The RHEL/CentOS page used a different template: ``` .. include:: /upgrade/_common/upgrade-guide-v4-rpm.rst ``` This was an unmaintainable mess. Most of the content was "the same" for each of these options. The only content that must actually be different is the part with package installation instructions (e.g. calls to `yum` vs `apt-get`). The rest of the content was logically the same - the differences were mistakes, typos, and updates/fixes to the text that were made in some of these docs but not others. In this commit I prepare a single page that covers the upgrade and rollback procedures for each of these options. The section dependent on the system was implemented using Sphinx Tabs. I also fixed and changed some parts: - In the "Gracefully stop the node" section: Ubuntu/Debian/Images pages had: ```rst .. code:: sh sudo service scylla-server stop ``` RHEL/CentOS pages had: ```rst .. code:: sh .. include:: /rst_include/scylla-commands-stop-index.rst ``` the stop-index file contained this: ```rst .. tabs:: .. group-tab:: Supported OS .. code-block:: shell sudo systemctl stop scylla-server .. group-tab:: Docker .. code-block:: shell docker exec -it some-scylla supervisorctl stop scylla (without stopping some-scylla container) ``` So the RHEL/CentOS version had two tabs: one for Scylla installed directly on the system, one for Scylla running in Docker - which is interesting, because nothing anywhere else in the upgrade documents mentions Docker. Furthermore, the RHEL/CentOS version used `systemctl` while the ubuntu/debian/images version used `service` to stop/start scylla-server. Both work on modern systems. The Docker option is completely out of place - the rest of the upgrade procedure does not mention Docker. So I decided it doesn't make sense to include it. Docker documentation could be added later if we actually decide to write upgrade documentation when using Docker... Between `systemctl` and `service` I went with `service` as it's a bit higher-level. - Similar change for "Start the node" section, and corresponding stop/start sections in the Rollback procedure. - To reuse text for Ubuntu and Debian, when referencing "ScyllaDB deb repo" in the Debian/Ubuntu tabs, I provide two separate links: to Debian and Ubuntu repos. - the link to rollback procedure in the RPM guide (in 'Download and install the new release' section) pointed to rollback procedure from 3.0 to 3.1 guide... Fixed to point to the current page's rollback procedure. - in the rollback procedure steps summary, the RPM version missed the "Restore system tables" step. - in the rollback procedure, the repository links were pointing to the new versions, while they should point to the old versions. There are some other pre-existing problems I noticed that need fixing: - EC2/GCP/Azure option has no corresponding coverage in the rollback section (Download and install the old release) as it has in the upgrade section. There is no guide for rolling back 3rd party and OS packages, only Scylla. I left a TODO in a comment. - the repository links assume certain Debian and Ubuntu versions (Debian 10 and Ubuntu 20), but there are more available options (e.g. Ubuntu 22). Not sure how to deal with this problem. Maybe a separate section with links? Or just a generic link without choice of platform/version? Closes #11891	2022-11-07 14:02:08 +02:00
Avi Kivity	9fa1783892	Merge 'cleanup compaction: flush memtable' from Benny Halevy Flush the memtable before cleaning up the table so not to leave any disowned tokens in the memtable as they might be resurrected if left in the memtable. Fixes #1239 Closes #11902 * github.com:scylladb/scylladb: table: perform_cleanup_compaction: flush memtable table: add perform_cleanup_compaction api: storage_service: add logging for compaction operations et al	2022-11-07 13:18:12 +02:00
Anna Stuchlik	c8455abb71	Update docs/index.rst Co-authored-by: Tzach Livyatan <tzach.livyatan@gmail.com>	2022-11-07 10:25:24 +01:00
AdamStawarz	6bc455ebea	Update tombstones-flush.rst change syntax: nodetool compact <keyspace>.<mytable>; to nodetool compact <keyspace> <mytable>; Closes #11904	2022-11-07 11:19:26 +02:00
Avi Kivity	224a2877b9	build: disable -Og in debug mode to avoid coroutine asan breakage Coroutines and asan don't mix well on aarch64. This was seen in `22f13e7ca3` (" Revert "Merge 'cql3: select_statement: coroutinize indexed_table_select_statement::do_execute_base_query()' from Avi Kivity"") where a routine coroutinization was reverted due to failures on aarch64 debug mode. In clang 15 this is even worse, the existing code starts failing. However, if we disable optimization (-O0 rather than -Og), things begin to work again. In fact we can reinstate the patch reverted above even with clang 12. Fix (or rather workaround) the problem by avoiding -Og on aarch64 debug mode. There's the lingering fear that release mode is miscompiled too, but all the tests pass on clang 15 in release mode so it appears related to asan. Closes #11894	2022-11-07 10:55:13 +02:00
Benny Halevy	eb3a94e2bc	table: perform_cleanup_compaction: flush memtable We don't explicitly cleanup the memtable, while it might hold tokens disowned by the current node. Flush the memtable before performing cleanup compaction to make sure all tokens in the memtable are cleaned up. Note that non-owned ranges are invalidate in the cache in compaction_group::update_main_sstable_list_on_compaction_completion using desc.ranges_for_cache_invalidation. Fixes #1239 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-06 19:41:40 +02:00
Benny Halevy	fc278be6c4	table: add perform_cleanup_compaction Move the integration with compaction_manager from the api layer to the tabel class so it can also make sure the memtable is cleaned up in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-06 19:41:33 +02:00
Benny Halevy	85523c45c0	api: storage_service: add logging for compaction operations et al Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-06 19:41:31 +02:00
Petr Gusev	44f48bea0f	raft: test_remove_node_with_concurrent_ddl The test runs remove_node command with background ddl workload. It was written in an attempt to reproduce scylladb#11228 but seems to have value on its own. The if_exists parameter has been added to the add_table and drop_table functions, since the driver could retry the request sent to a removed node, but that request might have already been completed. Function wait_for_host_known waits until the information about the node reaches the destination node. Since we add new nodes at each iteration in main, this can take some time. A number of abort-related options was added SCYLLA_CMDLINE_OPTIONS as it simplifies nailing down problems. Closes #11734	2022-11-04 17:16:35 +01:00
David Garcia	26bc53771c	docs: automatic previews configuration Closes #11591	2022-11-04 15:44:22 +02:00
Kamil Braun	e086521c1a	direct_failure_detector: get rid of complex `endpoint_id` translations The direct failure detector operates on abstract `endpoint_id`s for pinging. The `pigner` interface is responsible for translating these IDs to 'real' addresses. Earlier we used two types of addresses: IP addresses in 'production' code (`gms::gossiper::direct_fd_pinger`) and `raft::server_id`s in test code (in `randomized_nemesis_test`). For each of these use cases we would maintain mappings between `endpoint_id`s and the address type. In recent commits we switched the 'production' code to also operate on Raft server IDs, which are UUIDs underneath. In this commit we switch `endpoint_id`s from `unsigned` type to `utils::UUID`. Because each use case operates in Raft server IDs, we can perform a simple translation: `raft_id.uuid()` to get an `endpoint_id` from a Raft ID, `raft::server_id{ep_id}` to obtain a Raft ID from an `endpoint_id`. We no longer have to maintain complex sharded data structures to store the mappings.	2022-11-04 09:38:08 +01:00
Kamil Braun	bdeef77f20	service/raft: ping `raft::server_id`s, not `gms::inet_address`es Whenever a Raft configuration change is performed, `raft::server` calls `raft_rpc::add_server`/`raft_rpc::remove_server`. Our `raft_rpc` implementation has a function, `_on_server_update`, passed in the constructor, which it called in `add_server`/`remove_server`; that function would update the set of endpoints detected by the direct failure detector. `_on_server_update` was passed an IP address and that address was added to / removed from the failure detector set (there's another translation layer between the IP addresses and internal failure detector 'endpoint ID's; but we can ignore it for the purposes of this commit). Therefore: the failure detector was pinging a certain set of IP addresses. These IP addresses were updated during Raft configuration changes. To implement the `is_alive(raft::server_id)` function (required by `raft::failure_detector` interface), we would translate the ID using the Raft address map, which is currently also updated during configuration changes, to an IP address, and check if that IP address is alive according to the direct failure detector (which maintained an `_alive_set` of type `unordered_set<gms::inet_address>`). This all works well but it assumes that servers can be identified using IP addresses - it doesn't play well with the fact that servers may change their IP addresses. The only immutable identifier we have for a server is `raft::server_id`. In the future, Raft configurations will not associate IP addresses with Raft servers; instead we will assume that IP addresses can change at any time, and there will be a different mechanism that eventually updates the Raft address map with the latest IP address for each `raft::server_id`. To prepare us for that future, in this commit we no longer operate in terms of IP addresses in the failure detector, but in terms of `raft::server_id`s. Most of the commit is boilerplate, changing `gms::inet_address` to `raft::server_id` and function/variable names. The interesting changes are: - in `is_alive`, we no longer need to translate the `raft::server_id` to an IP address, because now the stored `_alive_set` already contains `raft::server_id`s instead of `gms::inet_address`es. - the `ping` function now takes a `raft::server_id` instead of `gms::inet_address`. To send the ping message, we need to translate this to IP address; we do it by the `raft_address_map` pointer introduced in an earlier commit. Thus, there is still a point where we have to translate between `raft::server_id` and `gms::inet_address`; but observe we now do it at the last possible moment - just before sending the message. If we have no translation, we consider the `ping` to have failed - it's equivalent to a network failure where no route to a given address was found.	2022-11-04 09:38:08 +01:00
Kamil Braun	ac70a05c7e	service/raft: store `raft_address_map` reference in `direct_fd_pinger` The pinger will use the map to translate `raft::server_id`s to `gms::inet_address`es when pinging.	2022-11-04 09:38:08 +01:00
Kamil Braun	2c20f2ab9d	gms: gossiper: move `direct_fd_pinger` out to a separate service In later commit `direct_fd_pinger` will operate in terms of `raft::server_id`s. Decouple it from `gossiper` since we don't want to entangle `gossiper` with Raft-specific stuff.	2022-11-04 09:38:08 +01:00
Kamil Braun	e9a4263e14	gms: gossiper: direct_fd_pinger: extract generation number caching to a separate class `gms::gossiper::direct_fd_pinger` serves multiple purposes: one of them is to maintain a mapping between `gms::inet_address`es and `direct_failure_detector::pinger::endpoint_id`s, another is to cache the last known gossiper's generation number to use it for sending gossip echo messages. The latter is the only gossiper-specific thing in this class. We want to move `direct_fd_pinger` utside `gossiper`. To do that, split the gossiper-specific thing -- the generation number management -- to a smaller class, `echo_pinger`. `echo_pinger` is a top-level class (not a nested one like `direct_fd_pinger` was) so we can forward-declare it and pass references to it without including gms/gossiper.hh header.	2022-11-04 09:38:08 +01:00
Avi Kivity	768d77d31b	Update seastar submodule * seastar f32ed00954...e0dabb361f (12): > sstring: define formatter > file: Dont violate API layering > Add compile_commands.json to gitignore > Merge 'Add an allocation failure metric' from Travis Downs > Use const test objects > Ragel chunk parser: compilation err, unused var > build: do not expose Valgrind in SeastarTargets.cmake > defer: mark deferred_* with [[nodiscard]] > Log selected reactor backend during startup > http: mark str with [[maybe_unused]] > Merge 'reactor: open fd without O_NONBLOCK when using io_uring backend' from Kefu Chai > reactor: add accept and connect to io_uring backend Closes #11895	2022-11-04 09:27:56 +04:00
Anna Stuchlik	fb01565a15	doc: replace Scylla with ScyllaDB on the landing page	2022-11-03 17:42:49 +01:00
Anna Stuchlik	7410ab0132	doc: improve the wording on the landing page	2022-11-03 17:38:14 +01:00
Anna Stuchlik	ab5e48261b	doc: add the link to the ScyllaDB Basics page to the documentation landing page	2022-11-03 17:31:03 +01:00
Pavel Emelyanov	efbfcdb97e	Merge 'Replicate `raft_address_map` non-expiring entries to other shards' from Kamil Braun Replicating `raft_address_map` entries is needed for the following use cases: - the direct failure detector - currently it assumes a static mapping of `raft::server_id`s to `gms::inet_address`es, which is obtained on Raft group 0 configuration changes. To handle dynamic mappings we need to modify the failure detector so it pings `raft::server_id`s and obtains the `gms::inet_address` before sending the message from `raft_address_map`. The failure detector is sharded, so we need the mappings to be available on all shards. - in the future we'll have multiple Raft groups running on different shards. To send messages they'll need `raft_address_map`. Initially I tried to replicate all entries - expiring and non-expiring. The implementation turned out to be very complex - we need to handle dropping expired entries and refreshing expiring entries' timestamps across shards, and doing this correctly while accounting for possible races is quite problematic. Eventually I arrived at the conclusion that replicating only non-expiring entries, and furthermore allowing non-expiring entries to be added only on shard 0, is good enough for our use cases: - The direct failure detector is pinging group 0 members only; group 0 members correspond exactly to the non-expiring entries. - Group 0 configuration changes are handled on shard 0, so non-expiring entries are added/removed on shard 0. - When we have multiple Raft groups, we can reuse a single Raft server ID for all Raft servers running on a single node belonging to different groups; they are 'namespaced' by the group IDs. Furthermore, every node has a server that belongs to group 0. Thus for every Raft server in every group, it has a corresponding server in group 0 with the same ID, which has a non-expiring entry in `raft_address_map`, which is replicated to all shards; so every group will be able to deliver its messages. With these assumptions the implementation is short and simple. We can always complicate it in the future if we find that the assumptions are too strong. Closes #11791 * github.com:scylladb/scylladb: test/raft: raft_address_map_test: add replication test service/raft: raft_address_map: replicate non-expiring entries to other shards service/raft: raft_address_map: assert when entry is missing in drop_expired_entries service/raft: turn raft_address_map into a service	2022-11-03 18:34:42 +03:00
Avi Kivity	ca2010144e	test: loading_cache_test: fix use-after-free in test_loading_cache_remove_leaves_no_old_entries_behind We capture `key` by reference, but it is in a another continuation. Capture it by value, and avoid the default capture specification. Found by clang 15 + asan + aarch64. Closes #11884	2022-11-03 17:23:40 +02:00
Avi Kivity	0c3967cf5e	Merge 'scylla-gdb.py: improve scylla-fiber' from Botond Dénes The main theme of this patchset is improving `scylla-fiber`, with some assorted unrelated improvement tagging along. In lieu of explicit support for mapping up continuation chains in memory from seastar (there is one but it uses function calls), scylla fiber uses a quite crude method to do this: it scans task objects for outbound references to other task objects to find waiters tasks and scans inbound references from other tasks to find waited-on tasks. This works well for most objects, but there are some problematic ones: * `seastar::thread_context`: the waited-on task (`seastar::(anonymous namespace)::thread_wake_task`) is allocated on the thread's stack which is not in the object itself. Scylla fiber now scans the stack bottom-up to find this task. * `seastar::smp_message_queue::async_work_item`: the waited on task lives on another shard. Scylla fiber now digs out the remote shard from the work item and continues the search on the remote shard. * `seastar::when_all_state`: the waited on task is a member in the same object tripping loop detection and terminating the search. Seastar fiber now uses the `_continuation` member explicitely to look for the next links. Other minor improvements were also done, like including the shard of the task in the printout. Example demonstrating all the new additions: ``` (gdb) scylla fiber 0x000060002d650200 Stopping because loop is detected: task 0x000061c00385fb60 was seen before. [shard 28] #-13 (task) 0x000061c00385fba0 0x00000000003b5b00 vtable for seastar::internal::when_all_state_component<seastar::future<void> > + 16 [shard 28] #-12 (task) 0x000061c00385fb60 0x0000000000417010 vtable for seastar::internal::when_all_state<seastar::internal::identity_futures_tuple<seastar::future<void>, seastar::future<void> >, seastar::future<void>, seastar::future<void> > + 16 [shard 28] #-11 (task) 0x000061c009f16420 0x0000000000419830 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEEZNS_6futureISt5tupleIJNS4_IvEES6_EEE14discard_resultEvEUlDpOT_E_ZNS8_14then_impl_nrvoISC_S6_EET0_OT_EUlOS3_RSC_ONS_12future_stateIS7_EEE_S7_EE + 16 [shard 28] #-10 (task) 0x000061c0098e9e00 0x0000000000447440 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}>::run_and_dispose()::{lambda(auto:1)#1}, seastar::future<void>::then_wrapped_nrvo<void, seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}> >(seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #-9 (task) 0x000060000858dcd0 0x0000000000449d68 vtable for seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}> + 16 [shard 0] #-8 (task) 0x0000600050c39f60 0x00000000007abe98 vtable for seastar::parallel_for_each_state + 16 [shard 0] #-7 (task) 0x000060000a59c1c0 0x0000000000449f60 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::sharded<cql_transport::cql_server>::stop()::{lambda(seastar::future<void>)#2}, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, {lambda(seastar::future<void>)#2}>({lambda(seastar::future<void>)#2}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(seastar::future<void>)#2}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #-6 (task) 0x000060000a59c400 0x0000000000449ea0 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, cql_transport::controller::do_stop_server()::{lambda(std::unique_ptr<seastar::sharded<cql_transport::cql_server>, std::default_delete<seastar::sharded<cql_transport::cql_server> > >&)#1}::operator()(std::unique_ptr<seastar::sharded<cql_transport::cql_server>, std::default_delete<seastar::sharded<cql_transport::cql_server> > >&) const::{lambda()#1}::operator()() const::{lambda()#1}, seastar::future<void>::then_impl_nrvo<{lambda()#1}, {lambda()#1}>({lambda()#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda()#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #-5 (task) 0x0000600009d86cc0 0x0000000000449c00 vtable for seastar::internal::do_with_state<std::tuple<std::unique_ptr<seastar::sharded<cql_transport::cql_server>, std::default_delete<seastar::sharded<cql_transport::cql_server> > > >, seastar::future<void> > + 16 [shard 0] #-4 (task) 0x00006000019ffe20 0x00000000007ab368 vtable for seastar::(anonymous namespace)::thread_wake_task + 16 [shard 0] #-3 (task) 0x00006000085ad080 0x0000000000809e18 vtable for seastar::thread_context + 16 [shard 0] #-2 (task) 0x0000600009c04100 0x00000000006067f8 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEEZNS_5asyncIZZN7service15storage_service5drainEvENKUlRS6_E_clES7_EUlvE_JEEENS_8futurizeINSt9result_ofIFNSt5decayIT_E4typeEDpNSC_IT0_E4typeEEE4typeEE4typeENS_17thread_attributesEOSD_DpOSG_EUlvE0_ZNS_6futureIvE14then_impl_nrvoIST_SV_EET0_SQ_EUlOS3_RST_ONS_12future_stateINS1_9monostateEEEE_vEE + 16 [shard 0] #-1 (task) 0x000060000a59c080 0x0000000000606ae8 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEENS_6futureIvE12finally_bodyIZNS_5asyncIZZN7service15storage_service5drainEvENKUlRS9_E_clESA_EUlvE_JEEENS_8futurizeINSt9result_ofIFNSt5decayIT_E4typeEDpNSF_IT0_E4typeEEE4typeEE4typeENS_17thread_attributesEOSG_DpOSJ_EUlvE1_Lb0EEEZNS5_17then_wrapped_nrvoIS5_SX_EENSD_ISG_E4typeEOT0_EUlOS3_RSX_ONS_12future_stateINS1_9monostateEEEE_vEE + 16 [shard 0] #0 (task) 0x000060002d650200 0x0000000000606378 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<service::storage_service::run_with_api_lock<service::storage_service::drain()::{lambda(service::storage_service&)#1}>(seastar::basic_sstring<char, unsigned int, 15u, true>, service::storage_service::drain()::{lambda(service::storage_service&)#1}&&)::{lambda(service::storage_service&)#1}::operator()(service::storage_service&)::{lambda()#1}, false>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, {lambda(service::storage_service&)#1}>({lambda(service::storage_service&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(service::storage_service&)#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #1 (task) 0x000060000bc40540 0x0000000000606d48 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEENS_6futureIvE12finally_bodyIZNS_3smp9submit_toIZNS_7shardedIN7service15storage_serviceEE9invoke_onIZNSB_17run_with_api_lockIZNSB_5drainEvEUlRSB_E_EEDaNS_13basic_sstringIcjLj15ELb1EEEOT_EUlSF_E_JES5_EET1_jNS_21smp_submit_to_optionsESK_DpOT0_EUlvE_EENS_8futurizeINSt9result_ofIFSJ_vEE4typeEE4typeEjSN_SK_EUlvE_Lb0EEEZNS5_17then_wrapped_nrvoIS5_S10_EENSS_ISJ_E4typeEOT0_EUlOS3_RS10_ONS_12future_stateINS1_9monostateEEEE_vEE + 16 [shard 0] #2 (task) 0x000060000332afc0 0x00000000006cb1c8 vtable for seastar::continuation<seastar::internal::promise_base_with_type<seastar::json::json_return_type>, api::set_storage_service(api::http_context&, seastar::httpd::routes&)::{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}::operator()(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >) const::{lambda()#1}, seastar::future<void>::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}, {lambda()#1}<seastar::json::json_return_type> >({lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}&&)::{lambda(seastar::internal::promise_base_with_type<seastar::json::json_return_type>&&, {lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #3 (task) 0x000060000a1af700 0x0000000000812208 vtable for seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, seastar::httpd::function_handler::function_handler(std::function<seastar::future<seastar::json::json_return_type> (std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)> const&)::{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}::operator()(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >) const::{lambda(seastar::json::json_return_type&&)#1}, seastar::future<seastar::json::json_return_type>::then_impl_nrvo<seastar::json::json_return_type&&, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > >(seastar::json::json_return_type&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&, seastar::json::json_return_type&, seastar::future_state<seastar::json::json_return_type>&&)#1}, seastar::json::json_return_type> + 16 [shard 0] #4 (task) 0x0000600009d86440 0x0000000000812228 vtable for seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, seastar::httpd::function_handler::handle(seastar::basic_sstring<char, unsigned int, 15u, true> const&, std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)::{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::future>({lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&, {lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&, seastar::future_state<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&)#1}, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > + 16 [shard 0] #5 (task) 0x0000600009dba0c0 0x0000000000812f48 vtable for seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::handle_exception<std::function<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > (std::__exception_ptr::exception_ptr)>&>(std::function<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > (std::__exception_ptr::exception_ptr)>&)::{lambda(auto:1&&)#1}, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::then_wrapped_nrvo<seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, {lambda(auto:1&&)#1}>({lambda(auto:1&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&, {lambda(auto:1&&)#1}&, seastar::future_state<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&)#1}, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > + 16 [shard 0] #6 (task) 0x0000600026783ae0 0x00000000008118b0 vtable for seastar::continuation<seastar::internal::promise_base_with_type<bool>, seastar::httpd::connection::generate_reply(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)::{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::httpd::connection::generate_reply(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)::{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}<bool> >({lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&&)::{lambda(seastar::internal::promise_base_with_type<bool>&&, {lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&, seastar::future_state<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&)#1}, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > + 16 [shard 0] #7 (task) 0x000060000a4089c0 0x0000000000811790 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::httpd::connection::read_one()::{lambda()#1}::operator()()::{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}::operator()(std::default_delete<std::unique_ptr>) const::{lambda(std::default_delete<std::unique_ptr>)#1}::operator()(std::default_delete<std::unique_ptr>) const::{lambda(bool)#2}, seastar::future<bool>::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}, {lambda(std::default_delete<std::unique_ptr>)#1}<void> >({lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}&, seastar::future_state<bool>&&)#1}, bool> + 16 [shard 0] #8 (task) 0x000060000a5b16e0 0x0000000000811430 vtable for seastar::internal::do_until_state<seastar::httpd::connection::read()::{lambda()#1}, seastar::httpd::connection::read()::{lambda()#2}> + 16 [shard 0] #9 (task) 0x000060000aec1080 0x00000000008116d0 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::httpd::connection::read()::{lambda(seastar::future<void>)#3}, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, {lambda(seastar::future<void>)#3}>({lambda(seastar::future<void>)#3}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(seastar::future<void>)#3}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #10 (task) 0x000060000b7d2900 0x0000000000811950 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<seastar::httpd::connection::read()::{lambda()#4}, true>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::httpd::connection::read()::{lambda()#4}>(seastar::httpd::connection::read()::{lambda()#4}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::httpd::connection::read()::{lambda()#4}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 Found no further pointers to task objects. If you think there should be more, run `scylla fiber 0x000060002d650200 --verbose` to learn more. Note that continuation across user-created seastar::promise<> objects are not detected by scylla-fiber. ``` Closes #11822 * github.com:scylladb/scylladb: scylla-gdb.py: collection_element: add support for boost::intrusive::list scylla-gdb.py: optional_printer: eliminate infinite loop scylla-gdb.py: scylla-fiber: add note about user-instantiated promise objects scylla-gdb.py: scylla-fiber: reject self-references when probing pointers scylla-gdb.py: scylla-fiber: add starting task to known tasks scylla-gdb.py: scylla-fiber: add support for walking over when_all scylla-gdb.py: add when_all_state to task type whitelist scylla-gdb.py: scylla-fiber: also print shard of tasks scylla-gdb.py: scylla-fiber: unify task printing scylla-gdb.py: scylla fiber: add support for walking over shards scylla-gdb.py: scylla fiber: add support for walking over seastar threads scylla-gdb.py: scylla-ptr: keep current thread context scylla-gdb.py: improve scylla column_families scylla-gdb.py: scylla_sstables.filename(): fix generation formatting scylla-gdb.py: improve schema_ptr scylla-gdb.py: scylla memory: restore compatibility with <= 5.1	2022-11-03 13:52:31 +02:00
Kamil Braun	2049962e11	Fix version numbers in upgrade page title Closes #11878	2022-11-03 10:06:25 +02:00
Takuya ASADA	45789004a3	install-dependencies.sh: update node_exporter to 1.4.0 To fix CVE-2022-24675, we need to a binary compiled in <= golang 1.18.1. Only released version which compiled <= golang 1.18.1 is node_exporter 1.4.0, so we need to update to it. See scylladb/scylla-enterprise#2317 Closes #11400 [avi: regenerated frozen toolchain] Closes #11879	2022-11-03 10:15:22 +04:00
Yaron Kaikov	20110bdab4	configure.py: remove un-used tar files creation Starting from https://github.com/scylladb/scylla-pkg/pull/3035 we removed all old tar.gz prefix from uploading to S3 or been used by downstream jobs. Hence, there is no point building those tar.gz files anymore Closes #11865	2022-11-02 17:44:09 +02:00
Anna Stuchlik	d1f7cc99bc	doc: fix the external links to the ScyllaDB University lesson about TTL Closes #11876	2022-11-02 15:05:43 +02:00
Nadav Har'El	59fa8fe903	Merge 'doc: add the information about AArch64 support to Requirements' from Anna Stuchlik Fix https://github.com/scylladb/scylla-doc-issues/issues/864 This PR: - updates the introduction to add information about AArch64 and rewrite the content. - replaces "Scylla" with "ScyllaDB". Closes #11778 * github.com:scylladb/scylladb: Update docs/getting-started/system-requirements.rst doc: fix the link to the OS Support page doc: replace Scylla with ScyllaDB doc: update the info about supported architecture and rewrite the introduction	2022-11-02 11:18:20 +02:00
Anna Stuchlik	ea799ad8fd	Update docs/getting-started/system-requirements.rst Co-authored-by: Tzach Livyatan <tzach.livyatan@gmail.com>	2022-11-02 09:56:56 +01:00
guy9	097a65df9f	adding top banner to the Docs website with a link to the ScyllaDB University fall LIVE event Closes #11873	2022-11-02 10:20:40 +02:00
Nadav Har'El	b9d88a3601	cql/pytest: add reproducer for timestamp column validation issue This patch adds a reproducing test for issue #11588, which is still open so the test is expected to fail on Scylla ("xfail), and passes on Cassandra. The test shows that Scylla allows an out-of-range value to be written to timestamp column, but then it can't be read back. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11864	2022-11-01 08:11:01 +02:00
Botond Dénes	dc46bfa783	Merge 'Prepare repair for task manager integration' from Aleksandra Martyniuk The PR prepares repair for task manager integration: - Creates repair_module - Keeps repair_module in repair_service - Moves tracker methods to repair_module - Changes UUID to task_id in repair module Closes #11851 * github.com:scylladb/scylladb: repair: check shutdown with abort source in repair module repair: use generic module gate for repair module operations repair: move tracker to repair module repair: move next_repair_command to repair_module repair: generate repair id in repair module repair: keep shard number in repair_uniq_id repair: change UUID to task_id repair: add task_manager::module to repair_service repair: create repair module and task	2022-11-01 08:05:14 +02:00
Aleksandra Martyniuk	f2fe586f03	repair: check shutdown with abort source in repair module In repair module the shutdown can be checked using abort_source. Thus, we can get rid of shutdown flag.	2022-10-31 10:57:29 +01:00
Aleksandra Martyniuk	2d878cc9b5	repair: use generic module gate for repair module operations Repair module uses a gate to prevent starting new tasks on shutdown. Generic module's gate serves the same purpose, thus we can use it also in repair specific context.	2022-10-31 10:56:36 +01:00
Aleksandra Martyniuk	4aae7e9026	repair: move tracker to repair module Since both tracker and repair_module serve similar purpose, it is confusing where we should seek for methods connected to them. Thus, to make it more transparent, tracker class is deleted and all its attributes and methods are moved to repair_module.	2022-10-31 10:55:36 +01:00
Aleksandra Martyniuk	a5c05dcb60	repair: move next_repair_command to repair_module Number of the repair operation was counted both with next_repair_command from tracer and sequence number from task_manager::module. To get rid of redundancy next_repair_command was deleted and all methods using its value were moved to repair_module.	2022-10-31 10:54:39 +01:00
Aleksandra Martyniuk	c81260fb8b	repair: generate repair id in repair module repair_uniq_id for repair task can be generated in repair module and accessed from the task.	2022-10-31 10:54:24 +01:00
Aleksandra Martyniuk	6432a26ccf	repair: keep shard number in repair_uniq_id Execution shard is one of the traits specific to repair tasks. Child task should freely access shard id of its parent. Thus, the shard number is kept in a repair_uniq_id struct.	2022-10-31 10:41:17 +01:00
guy9	276ec377c0	removed broken roadmap link Closes #11854	2022-10-31 11:33:03 +02:00
Aleksandra Martyniuk	e2c7c1495d	repair: change UUID to task_id Change type of repair id from utils::UUID to task_id to distinguish them from ids of other entities.	2022-10-31 10:07:08 +01:00
Aleksandra Martyniuk	dc80af33bc	repair: add task_manager::module to repair_service repair_service keeps a shared pointer to repair_module.	2022-10-31 10:04:50 +01:00
Aleksandra Martyniuk	576277384a	repair: create repair module and task Create repair_task_impl and repair_module inheriting from respectively task manager task_impl and module to integrate repair operations with task manager.	2022-10-31 10:04:48 +01:00
Takuya ASADA	159bc7c7ea	install-dependencies.sh: use binary distributions of PIP package We currently avoid compiling C code in "pip3 install scylla-driver", but we actually providing portable binary distributions of the package, so we should use it by "pip3 install --only-binary=:all: scylla-driver". The binary distribution contains dependency libraries, so we won't have problem loading it on relocatable python3. Closes #11852	2022-10-31 10:38:36 +02:00
Kamil Braun	db6cc035ed	test/raft: raft_address_map_test: add replication test	2022-10-31 09:17:12 +01:00
Kamil Braun	7d84007fd5	service/raft: raft_address_map: replicate non-expiring entries to other shards Replicating `raft_address_map` entries is needed for the following use cases: - the direct failure detector - currently it assumes a static mapping of `raft::server_id`s to `gms::inet_address`es, which is obtained on Raft group 0 configuration changes. To handle dynamic mappings we need to modify the failure detector so it pings `raft::server_id`s and obtains the `gms::inet_address` before sending the message from `raft_address_map`. The failure detector is sharded, so we need the mappings to be available on all shards. - in the future we'll have multiple Raft groups running on different shards. To send messages they'll need `raft_address_map`. Initially I tried to replicate all entries - expiring and non-expiring. The implementation turned out to be very complex - we need to handle dropping expired entries and refreshing expiring entries' timestamps across shards, and doing this correctly while accounting for possible races is quite problematic. Eventually I arrived at the conclusion that replicating only non-expiring entries, and furthermore allowing non-expiring entries to be added only on shard 0, is good enough for our use cases: - The direct failure detector is pinging group 0 members only; group 0 members correspond exactly to the non-expiring entries. - Group 0 configuration changes are handled on shard 0, so non-expiring entries are added/removed on shard 0. - When we have multiple Raft groups, we can reuse a single Raft server ID for all Raft servers running on a single node belonging to different groups; they are 'namespaced' by the group IDs. Furthermore, every node has a server that belongs to group 0. Thus for every Raft server in every group, it has a corresponding server in group 0 with the same ID, which has a non-expiring entry in `raft_address_map`, which is replicated to all shards; so every group will be able to deliver its messages. With these assumptions the implementation is short and simple. We can always complicate it in the future if we find that the assumptions are too strong.	2022-10-31 09:17:12 +01:00
Kamil Braun	acacbad465	service/raft: raft_address_map: assert when entry is missing in drop_expired_entries	2022-10-31 09:17:12 +01:00
Kamil Braun	159bb32309	service/raft: turn raft_address_map into a service	2022-10-31 09:17:10 +01:00
Botond Dénes	139fbb466e	Merge 'Task manager extension' from Aleksandra Martyniuk The PR adds changes to task manager that allow more convenient integration with modules. Introduced changes: - adds internal flag in task::impl that allows user to filter too specific tasks - renames `parent_data` to more appropriate name `task_info` - creates `tasks/types.hh` which allows using some types connected with task manager without the necessity to include whole task manager - adds more flexible version of `make_task` method Closes #11821 * github.com:scylladb/scylladb: tasks: add alternative make_task method tasks: rename parent_data to task_info and move it tasks: move task_id to tasks/types.hh tasks: add internal flag for task_manager::task::impl	2022-10-31 09:57:10 +02:00
Botond Dénes	2c021affd1	Merge 'storage_service, repair: use per-shard abort_source' from Benny Halevy Prevent copying shared_ptr across shards in do_sync_data_using_repair by allocating a shared_ptr<abort_source> per shard in node_ops_meta_data and respectively in node_ops_info. Fixes #11826 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11827 * github.com:scylladb/scylladb: repair: use sharded abort_source to abort repair_info repair: node_ops_info: add start and stop methods storage_service: node_ops_abort_thread: abort all node ops on shutdown storage_service: node_ops_abort_thread: co_return only after printing log message storage_service: node_ops_meta_data: add start and stop methods repair: node_ops_info: prevent accidental copy	2022-10-31 09:43:34 +02:00
Botond Dénes	63a90cfb6c	scylla-gdb.py: collection_element: add support for boost::intrusive::list	2022-10-31 08:18:20 +02:00
Botond Dénes	2fa1864174	scylla-gdb.py: optional_printer: eliminate infinite loop Currently, to_string() recursively calls itself for engaged optionals. Eliminate it. Also, use the std_optional wrapper instead of accessing std::optional internals directly.	2022-10-31 08:18:20 +02:00
Botond Dénes	77b2555a04	scylla-gdb.py: scylla-fiber: add note about user-instantiated promise objects Scylla fiber uses a crude method of scanning inbound and outbound references to/from other task objects of recognized type. This method cannot detect user instantiated promise<> objects. Add a note about this to the printout, so users are beware of this.	2022-10-31 08:18:20 +02:00
Botond Dénes	2276565a2e	scylla-gdb.py: scylla-fiber: reject self-references when probing pointers A self-reference is never the pointer we are looking for when looking for other tasks referencing us. Reject such references when scanning outright.	2022-10-31 08:18:20 +02:00
Botond Dénes	f4365dd7f5	scylla-gdb.py: scylla-fiber: add starting task to known tasks We collect already seen tasks in a set to be able to detect perceived task loops and stop when one is seen. Initialize this set with the starting task, so if it forms a loop, we won't repeat it in the trace before cutting the loop.	2022-10-31 08:18:20 +02:00
Botond Dénes	48bbf2e467	scylla-gdb.py: scylla-fiber: add support for walking over when_all	2022-10-31 08:18:20 +02:00
Botond Dénes	cb8f02e24b	scylla-gdb.py: add when_all_state to task type whitelist	2022-10-31 08:18:20 +02:00
Botond Dénes	62621abc44	scylla-gdb.py: scylla-fiber: also print shard of tasks Now that scylla-fiber can cross shards, it is important to display the shard each task in the chain lives on.	2022-10-31 08:18:19 +02:00
Botond Dénes	c21c80f711	scylla-gdb.py: scylla-fiber: unify task printing Currently there is two loops and a separate line printing the starting task, all duplicating the formatting logic. Define a method for it and use it in all 3 places instead.	2022-10-31 08:18:19 +02:00
Botond Dénes	c103280bfd	scylla-gdb.py: scylla fiber: add support for walking over shards Shard boundaries can be crossed in one direction currently: when looking for waiters on a task, but not in the other direction (looking for waited-on tasks). This patch fixes that.	2022-10-31 08:18:19 +02:00
Botond Dénes	437f888ba0	scylla-gdb.py: scylla fiber: add support for walking over seastar threads Currently seastar threads end any attempt to follow waited-on-futures. Seastar threads need special handling because it allocates the wake up task on its stack. This patch adds this special handling.	2022-10-31 08:18:19 +02:00
Botond Dénes	fcc63965ed	scylla-gdb.py: scylla-ptr: keep current thread context scylla_ptr.analyze() switches to the thread the analyzed object lives on, but forgets to switch back. This was very annoying as any commands using it (which is a bunch of them) were prone to suddenly and unexpectedly switching threads. This patch makes sure that the original thread context is switched back to after analyzing the pointer.	2022-10-31 08:18:19 +02:00
Botond Dénes	91516c1d68	scylla-gdb.py: improve scylla column_families Rename to scylla tables. Less typing and more up-to-date. By default it now only lists tables from local shard. Added flag -a which brings back old behaviour (lists on all shards). Added -u (only list user tables) and -k (list tables of provided keyspace only) filtering options.	2022-10-31 08:18:19 +02:00
Botond Dénes	1d3d613b76	scylla-gdb.py: scylla_sstables.filename(): fix generation formatting Generation was recently converted from an integer to an object. Update the filename formatting, while keeping backward compatibility.	2022-10-31 08:18:19 +02:00
Botond Dénes	c869f54742	scylla-gdb.py: improve schema_ptr Add __getitem__(), so members can be accessed. Strip " from ks_name and cf_name. Add is_system().	2022-10-31 08:18:19 +02:00
Botond Dénes	66832af233	scylla-gdb.py: scylla memory: restore compatibility with <= 5.1 Recent reworks around dirty memory manager broke backward compatibility of the scylla memory command (and possibly others). This patch restores it.	2022-10-31 08:18:19 +02:00
Tenghuan He	e0948ba199	Add directory change instruction Add directory change instruction while building scylla Closes #11717	2022-10-30 23:53:02 +02:00
Pavel Emelyanov	477e0c967a	scylla-gdb: Evaluate LSA object sizes dynamically The lsa-segment command tries to walk LSA segment objects by decoding their descriptors and (!) object sizes as well. Some objects in LSA have dynamic sizes, i.e. those depending on the object contents. The script tries to drill down the object internals to get this size, but bad news is that nowadays there are many dynamic objects that are not covered. Once stepped upon unsupported object, scylla-gdb likely stops because the "next" descriptor happens to be in the middle of the object and its parsing throws. This patch fixes this by taking advantage of the virtual size() call of the migrate_fn_type all LSA objects are linked with (indirectly). It gets the migrator object, the LSA object itself and calls ((migrate_fn_type)<migrator_ptr>)->size((const void)<object_ptr>) with gdb. The evaluated value is the live dynamic size of the object. fixes: #11792 refs: #2455 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11847	2022-10-28 14:11:30 +03:00
Botond Dénes	74c9aa3a3f	Merge 'removenode: allow specifying nodes to ignore using host_id' from Benny Halevy Currently, when specifying nodes to ignore for replace or removenode, we support specifying them only using their ip address. As discussed in https://github.com/scylladb/scylladb/issues/11839 for removenode, we intentionally require the host uuid for specifying the node to remove, so the nodes to ignore (that are also done, otherwise we need not ignore them), should be consistent with that and be specified using their host_id. The series extends the apis and allows either the nodes ip address or their host_id to be specified, for backward compatibility. We should deprecate the ip address method over time and convert the tests and management software to use the ignored nodes' host_id:s instead. Closes #11841 * github.com:scylladb/scylladb: api: doc: remove_node: improve summary api, service: storage_service: removenode: allow passing ignore_nodes as uuid:s storage_service: get_ignore_dead_nodes_for_replace: use tm.parse_host_id_and_endpoint locator: token_metadata: add parse_host_id_and_endpoint api: storage_service: remove_node: validate host_id	2022-10-28 13:35:04 +03:00
Benny Halevy	335a8cc362	api: doc: remove_node: improve summary The current summary of the operation is obscure. It refers to a token in the ring and the endpoint associated with it, while the operation uses a host_id to identify a whole node. Instead, clarify the summary to refer to a node in the cluster, consistent with the description for the host_id parameter. Also, describe the effect the call has on the data the removed node logically owned. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-28 07:52:37 +03:00
Benny Halevy	9ef2631ec2	api, service: storage_service: removenode: allow passing ignore_nodes as uuid:s Currently the api is inconsistent: requiring a uuid for the host_id of the node to be removed, while the ignored nodes list is given as comma-separated ip addresses. Instead, support identifying the ignored_nodes either by their host_id (uuid) or ip address. Also, require all ignore_nodes to be of the same kind: either UUIDs or ip addresses, as a mix of the 2 is likely indicating a user error. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-28 07:49:03 +03:00
Benny Halevy	40cd685371	storage_service: get_ignore_dead_nodes_for_replace: use tm.parse_host_id_and_endpoint Allow specifying the dead node to ignore either as host_id or ip address. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-28 07:38:13 +03:00
Benny Halevy	b74807cb8a	locator: token_metadata: add parse_host_id_and_endpoint To be used for specifying nodes either by their host_id or ip address and using the token_metadata to resolve the mapping. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-28 07:38:13 +03:00
Benny Halevy	340a5a0c94	api: storage_service: remove_node: validate host_id The node to be removed must be identified by its host_id. Validate that at the api layer and pass the parsed host_id down to storage_service::removenode. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-28 07:38:13 +03:00
Takuya ASADA	464b5de99b	scylla_setup: allow symlink to --disks option Currently, --disks options does not allow symlinks such as /dev/disk/by-uuid/* or /dev/disk/azure/*. To allow using them, is_unused_disk() should resolve symlink to realpath, before evaluating the disk path. Fixes #11634 Closes #11646	2022-10-28 07:24:11 +03:00
Botond Dénes	b744036840	Merge 'scylla_util.py: on sysconfig_parser, don't use double quote when it's possible' from Takuya ASADA It seems like distribution original sysconfig files does not use double quote to set the parameter when the value does not contain space. Adding function to detect spaces in the value, don't usedouble quote when it not detected. Fixes #9149 Closes #9153 * github.com:scylladb/scylladb: scylla_util.py: adding unescape for sysconfig_parser scylla_util.py: on sysconfig_parser, don't use double quote when it's possible	2022-10-28 07:19:13 +03:00
Benny Halevy	44e1058f63	docs: nodetool/removenode: fix host_id in examples removenode host_id must specify the host ID as a UUID, not an ip address. Fixes #11839 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11840	2022-10-27 14:29:36 +03:00
Pavel Emelyanov	7b193ab0a5	messaging_service: Deny putting INADD_ANY as preferred ip Even though previous patch makes scylla not gossip this as internal_ip, an extra sanity check may still be useful. E.g. older versions of scylla may still do it, or this address can be loaded from system_keyspace. refs: #11502 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-27 14:25:43 +03:00
Pavel Emelyanov	aa7a759ac9	messaging_service: Toss preferred ip cache management Make it call cache_preferred_ip() even when the cache is loaded from system_keyspace and move the connection reset there. This is mainly to prepare for the next patch, but also makes the code a bit shorter Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-27 14:25:43 +03:00
Pavel Emelyanov	91b460f1c4	gossiping_property_file_snitch: Dont gossip INADDR_ANY preferred IP Gossiping 0.0.0.0 as preferred IP may break the peer as it will "interpret" this address as <myself> which is not what peer expects. However, g.p.f.s. uses --listen-address argument as the internal IP and it's not prohibited to configure it to be 0.0.0.0 It's better not to gossip the INTERNAL_IP property at all if the listen address is such. fixes: #11502 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-27 14:25:43 +03:00
Pavel Emelyanov	99579bd186	gossiping_property_file_snitch: Make _listen_address optional As the preparation for the next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-27 14:15:26 +03:00
Benny Halevy	0ea8250e83	repair: use sharded abort_source to abort repair_info Currently we use a single shared_ptr<abort_source> that can't be copied across shards. Instead, use a sharded<abort_source> in node_ops_info so that each repair_info instance will use an (optional) abort_source* on its own shard. Added respective start and stop methodsm plus a local_abort_source getter to get the shard-local abort_source (if available). Fixes #11826 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-27 12:18:30 +03:00
Benny Halevy	88f993e5ed	repair: node_ops_info: add start and stop methods Prepare for adding a sharded<abort_source> member. Wire start/stop in storage_service::node_ops_meta_data. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-27 12:18:30 +03:00
Benny Halevy	c2f384093d	storage_service: node_ops_abort_thread: abort all node ops on shutdown A later patch adds a sharded<abort_source> to node_ops_info. On shutdown, we must orderly stop it, so use node_ops_abort_thread shutdown path (where node_ops_singal_abort is called will a nullopt) to abort (and stop) all outstanding node_ops by passing a null_uuid to node_ops_abort, and let it iterate over all node ops to abort and stop them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-27 12:14:06 +03:00
Benny Halevy	0efd290378	storage_service: node_ops_abort_thread: co_return only after printing log message Currently the function co_returns if (!uuid_opt) so the log info message indicating it's stopped is not printed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-27 12:14:03 +03:00
Benny Halevy	47e4761b4e	storage_service: node_ops_meta_data: add start and stop methods Prepare for starting and stopping repair node_ops_info Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-27 12:14:03 +03:00
Benny Halevy	5c25066ea7	repair: node_ops_info: prevent accidental copy Delete node_ops_info copy and move constructors before we add a sharded<abort_source> member for the per-shard repairs in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-27 12:14:03 +03:00
Takuya ASADA	cd6030d5df	scylla_util.py: adding unescape for sysconfig_parser Even we have __escape() for escaping " middle of the value to writing sysconfig file, we didn't unescape for reading from sysconfig file. So adding __unescape() and call it on get().	2022-10-27 16:39:47 +09:00
Takuya ASADA	de57433bcf	scylla_util.py: on sysconfig_parser, don't use double quote when it's possible It seems like distribution original sysconfig files does not use double quote to set the parameter when the value does not contain space. Adding function to detect spaces in the value, don't usedouble quote when it not detected. Fixes #9149	2022-10-27 16:36:27 +09:00
Aleksandra Martyniuk	6494de9bb0	tasks: add alternative make_task method Task manager tasks should be created with make_task method since it properly sets information about child-parent relationship between tasks. Though, sometimes we may want to keep additional task data in classes inheriting from task_manager::task::impl. Doing it with existing make_task method makes it impossible since implementation objects are created internally. The commit adds a new make_task that allows to provide a task implementation pointer created by caller. All the fields except for the one connected with children and parent should be set before.	2022-10-26 14:01:05 +02:00
Aleksandra Martyniuk	10d11a7baf	tasks: rename parent_data to task_info and move it parent_data struct contains info that is common for each task, not only in parent-child relationship context. To use it this way without confusion, its name is changed to task_info. In order to be able to widely and comfortably use task_info, it is moved from tasks/task_manager.hh to tasks/types.hh and slightly extended.	2022-10-26 14:01:05 +02:00
Aleksandra Martyniuk	9ecc2047ac	tasks: move task_id to tasks/types.hh	2022-10-26 14:01:05 +02:00
Aleksandra Martyniuk	e2e8a286cc	tasks: add internal flag for task_manager::task::impl It is convenient to create many different tasks implementations representing more and more specific parts of the operation in a module. Presenting all of them through the api makes it cumbersome for user to navigate and track, though. Flag internal is added to task_manager::task::impl so that the tasks could be filtered before they are sent to user.	2022-10-26 14:01:05 +02:00
Pavel Emelyanov	e245780d56	gossiper: Request topology states in shadow round When doing shadow round for replacement the bootstrapping node needs to know the dc/rack info about the node it replaces to configure it on topology. This topology info is later used by e.g. repair service. fixes: #11829 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11838	2022-10-25 13:21:20 +03:00
Pavel Emelyanov	64c9359443	storage_proxy: Don't use default-initialized endpoint in get_read_executor() After calling filter_for_query() the extra_replica to speculate to may be left default-initialized which is :0 ipv6 address. Later below this address is used as-is to check if it belongs to the same DC or not which is not nice, as :0 is not an address of any existing endpoint. Recent move of dc/rack data onto topology made this place reveal itself by emitting the internal error due to :0 not being present on the topology's collection of endpoints. Prior to this move the dc filter would count :0 as belonging to "default_dc" datacenter which may or may not match with the dc of the local node. The fix is to explicitly tell set extra_replica from unset one. fixes: #11825 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11833	2022-10-25 09:16:50 +03:00
Takuya ASADA	1a11a38add	unified: move unified package contents to sub-directory On most of the software distribution tar.gz, it has sub-directory to contain everything, to prevent extract contents to current directory. We should follow this style on our unified package too. To do this we need to increment relocatable package version to '3.0'. Fixes #8349 Closes #8867	2022-10-25 08:58:15 +03:00
Takuya ASADA	a938b009ca	scylla_raid_setup: run uuidpath existance check only after mount failed We added UUID device file existance check on #11399, we expect UUID device file is created before checking, and we wait for the creation by "udevadm settle" after "mkfs.xfs". However, we actually getting error which says UUID device file missing, it probably means "udevadm settle" doesn't guarantee the device file created, on some condition. To avoid the error, use var-lib-scylla.mount to wait for UUID device file is ready, and run the file existance check when the service is failed. Fixes #11617 Closes #11666	2022-10-25 08:54:21 +03:00
Yaniv Kaul	cec21d10ed	docs: Fix typo (patch -> batch) See subject. Closes #11837	2022-10-25 08:50:44 +03:00
Michał Radwański	36508bf5e9	serializer_impl: remove unneeded generic parameter Input stream used in vector_deserializer doesn't need to be generic, as there is only one implementation used.	2022-10-24 17:21:38 +02:00
Tomasz Grabiec	687df05e28	db: make_forwardable::reader: Do not emit range_tombstone_change with position past the range Since the end bound is exclusive, the end position should be before_key(), not after_key(). Affects only tests, as far as I know, only there we can get an end bound which is a clustering row position. Would cause failures once row cache is switched to v2 representation because of violated assumptions about positions. Introduced in `76ee3f029c` Closes #11823	2022-10-24 17:06:52 +03:00
Avi Kivity	9e34779c53	Update seastar submodule * seastar 601e0776c0...f32ed00954 (28): > Merge 'treewide: more fmt 9 adjustments' from Avi Kivity > rpc: Remove nested class friend declaration from connection > reactor: advance the head pointer in batch > Add git submodule instructions to HACKING.md, resolves #541 > dns: Handle TCP mode connect failure > future: s/make_exception_ptr/std::make_exception_ptr/ > reactor: implement read_some(fd, buffer, len) in io_uring > reactor: remove unneeded "protected" > Merge 'reactor: support more network ops in io_uring backend' from Kefu Chai > reactor: Indentation fix after previous patch > io: Remove --max-io-requests concept > future: add concept constraints to handle_exception() > future: improve the doxygen document > aio_general_context: flush: provide 1 second grace for retries > reactor: destroy_scheduling_group: make sure scheduling_group is valid > reactor: pass a plain pointer to io_uring_wait_cqes() > gate: add move ctor and move assignment operator for gate > reactor: drop stale comment > reactor_config: update stale doc comments > test: alloc_test: Actually prevent dead allocation elimination > util/closeable: hold _obj with reference_wrapper<> > memory: Fix off-by-one in large allocation detection > util/closeable: add move ctor for deferred_stop > reactor: Remove some unused friend declarations > core/sharded.hh: tweak on comment for better readability > Merge 'fmt 9 ostream fix' from longlene > program_options: allow configure switch-stytle option programmatically > inet_address: Add helper to check for address being lo/any Closes #11814	2022-10-21 21:30:07 +03:00
Botond Dénes	4aa0b16852	Merge 'distributed_loader: detect highest generation before populating column families' from Benny Halevy We should scan all sstables in the table directory and its subdirectories to determine the highest sstable version and generation before using it for creating new sstables (via reshard or reshape). Otherwise, the generations of new sstables created when populating staging (via reshard or reshape) may collide with generations in the base directory, leading to https://github.com/scylladb/scylladb/issues/11789 Refs scylladb/scylladb#11789 Fixes scylladb/scylladb#11793 Closes #11795 * github.com:scylladb/scylladb: distributed_loader: populate_column_family: reindent distributed_loader: coroutinize populate_column_family distributed_loader: table_population_metadata: start: reindent distributed_loader: table_population_metadata: coroutinize start_subdir distributed_loader: table_population_metadata: start_subdir: reindent distributed_loader: pre-load all sstables metadata for table before populating it	2022-10-21 14:07:51 +03:00
Botond Dénes	e981bd4f21	Merge 'Alternator, MV: fix bug in some view updates which set the view key to its existing value' from Nadav Har'El As described in issue #11801, we saw in Alternator when a GSI has both partition and sort keys which were non-key attributes in the base, cases where updating the GSI-sort-key attribute to the same value it already had caused the entire GSI row to be deleted. In this series fix this bug (it was a bug in our materialized views implementation) and add a reproducing test (plus a few more tests for similar situations which worked before the patch, and continue to work after it). Fixes #11801 Closes #11808 * github.com:scylladb/scylladb: test/alternator: add test for issue 11801 MV: fix handling of view update which reassign the same key value materialized views: inline used-once and confusing function, replace_entry()	2022-10-21 10:49:28 +03:00
Botond Dénes	396d9e6a46	Merge 'Subscribe repair_info::abort on node_ops_meta_data::abort_source' from Pavel Emelyanov The storage_service::stop() calls repair_service::abort_repair_node_ops() but at that time the sharded<repair_service> is already stopped and call .local() on it just crashes. The suggested fix is to remove explicit storage_service -> repair_service kick. Instead, the repair_infos generated for the sake of node-ops are subscribed on the node_ops_meta_data's abort source and abort themselves automatically. fixes: #10284 Closes #11797 * github.com:scylladb/scylladb: repair: Remove ops_uuid repair: Remove abort_repair_node_ops() altogether repair: Subscribe on node_ops_info::as abortion repair: Keep abort source on node_ops_info repair: Pass node_ops_info arg to do_sync_data_using_repair() repair: Mark repair_info::abort() noexcept node_ops: Remove _aborted bit node_ops: Simplify construction of node_ops_metadata main: Fix message about repair service starting	2022-10-21 10:08:43 +03:00
Avi Kivity	9ebac12e60	test: mutation-test: fix off-by-one in test_large_collection_allocation The test wants to see that no allocations larger than 128k are present, but sets the warning threshold to exactly 128k. Due to an off-by-one in Seastar, this went unnoticed. However, now that the off-by-one in Seastar is fixed [1], this test starts to fail. Fix by setting the warning threshold to 128k + 1. [1] `429efb5086` Closes #11817	2022-10-21 10:04:40 +03:00
Avi Kivity	f0643d1713	alternator: ttl: do not copy mutation while constructing a vector The vector(initializer_list<T>) constructor copies the T since initializer_list is read-only. Move the mutation instead. This happens to fix a use-after-return on clang 15 on aarch64. I'm fairly sure that's a miscompile, but the fix is worthwhile regardless. Closes #11818	2022-10-21 10:04:00 +03:00
Avi Kivity	db79f1eb60	Merge 'cql3: expr: Add unit tests for evaluate()' from Jan Ciołek This PR adds some unit tests for the `expr::evaluate()` function. At first I wanted to add the unit tests as part of #11658, but their size grew and grew, until I decided that they deserve their own pull request. I found a few places where I think it would be better to behave in a different way, but nothing serious. Closes #11815 * github.com:scylladb/scylladb: test/boost: move expr_test_utils.hh to .hh and .cc in test/lib cql3: expr: Add unit tests for bind_variable validation of collections cql3: expr: Add test for subscripted list and map cql3: expr: Add test for usertype_constructor cql3: expr: Add test for tuple_constructor cql3: expr: Add tests for evaluation of collection constructors cql3: expr: Add tests for evaluation of column_values and bind_variables cql3: expr: Add constant evaluation tests test/boost: Add expr_test_utils.hh cql3: Add ostream operator for raw_value cql3: add is_empty_value() to raw_value and raw_value_view	2022-10-20 22:55:34 +03:00
Jan Ciolek	4c4ed8e6df	test/boost: move expr_test_utils.hh to .hh and .cc in test/lib expr_test_utils.hh was a header file with helper methods for expression tests. All functions were inline, because I didn't know how to create and link a .cc file in test/boost. Now the header is split into expr_test_utils.hh and expr_test_utils.cc and moved to test/lib, which is designed to keep this kind of files. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-10-20 17:31:37 +02:00
Avi Kivity	6ce659be5b	Merge "Deglobalize snitch" from Pavel E " Snitch was the junction of several services' deps because it was the holder of endpoint->dc/rack mappings. Now this information is all on topology object, so snitch can be finally made main-local " * 'br-deglobalize-snitch' of https://github.com/xemul/scylla: code: Deglobalize snitch tests: Get local reference on global snitch instance once gossiper: Pass current snitch name into checker snitch: Add sharded<snitch_ptr> arg to reset_snitch() api: Move update_snitch endpoint api: Use local snitch reference api: Unset snitch endpoints on stop storage_service: Keep local snitch reference system_keyspace: Don't use global snitch instance snitch: Add const snitch_ptr::operator->()	2022-10-20 16:51:24 +03:00
Avi Kivity	dd0b571d7e	Update tools/java submodule (Scylla Cloud serverless config option) * tools/java 5f2b91d774...87672be28e (1): > Add serverless Scylla Cloud config file option	2022-10-20 16:15:28 +03:00
Konstantin Osipov	8c920add42	test: (pytest) fix the pytest wrapper to work on Ubuntu Ubuntu doesn't have python, only python2 and python3. Closes #11810	2022-10-20 15:53:24 +03:00
Botond Dénes	669b225c67	reader_permit: resources: remove operator bool and >= These cannot be meaningfully define for a vector value like resources. To prevent instinctive misuse, remove them. Operator bool is replaced with `non_zero()` which hopefully better expresses what to expected. The comparison operator is just removed and inlined into its own user, which actually help said user's readability. Closes #11813	2022-10-20 15:25:11 +03:00
Jan Ciolek	75b27cb61c	cql3: expr: Add unit tests for bind_variable validation of collections evaluating a bind variable should validate collection values. Test that bound collection values are validated, even in case of a nested collection. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-10-20 12:12:03 +02:00
Jan Ciolek	c4651e897f	cql3: expr: Add test for subscripted list and map Test that subscripting lists and maps works as expected. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-10-20 12:12:03 +02:00
Jan Ciolek	5a00c3dd76	cql3: expr: Add test for usertype_constructor Test that evaluate(usertype_constructor) works as expected. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-10-20 12:12:03 +02:00
Jan Ciolek	8f6309bd66	cql3: expr: Add test for tuple_constructor Test that evaluate(tuple_constructor) works as expected. It was necessary to implement a custom function for serializing tuples, because some tests require the tuple to contain unset_value or an empty value, which is impossible to express using the exisiting code. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-10-20 12:12:03 +02:00
Jan Ciolek	5ae719d51a	cql3: expr: Add tests for evaluation of collection constructors Test that evaluate(collection_constructor) works as expected. Added a bunch of utility methods for creating collection values to expr_test_utils.hh. I was forced to write custom serialization of collections. I tried to use data_value, but it doesn't allow to express unset_value and empty values. The custom serialization isnt actually used in this specific commit, but it's needed in the following ones. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-10-20 12:12:02 +02:00
Pavel Emelyanov	01b1f56bd7	code: Deglobalize snitch All uses of snitch not have their own local referece. The global instance can now be replaced with the one living in main (and tests) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-20 12:33:41 +03:00
Pavel Emelyanov	8e4e3f7185	tests: Get local reference on global snitch instance once Some tests actively use global snitch instance. This patch makes each test get a local reference and use it everywhere. Next patch will replace global instance with local one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-20 12:33:40 +03:00
Pavel Emelyanov	898579027d	gossiper: Pass current snitch name into checker Gossiper makes sure local snitch name is the same as the one of other nodes in the ring. It now gets global snitch to get the name, this patch passes the name as an argument, because the caller (storage_service) has snitch instance local reference Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-20 12:33:38 +03:00
Pavel Emelyanov	1674882220	snitch: Add sharded<snitch_ptr> arg to reset_snitch() The method replaces snitch instance on the existing sharded<snitch_ptr> and the "existing" is nowadays the global instance. This patch changes it to use local reference passed from API code Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-20 12:33:34 +03:00
Pavel Emelyanov	5fba0a7f65	api: Move update_snitch endpoint It's now living in storage_service.cc, but non-global snitch is available in endpoint_snitch.cc so move the endpoint handler there Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-20 12:33:20 +03:00
Pavel Emelyanov	0d49b0e24a	api: Use local snitch reference The snitch/name endpoint needs snitch instance to get the name from. Also the storage_service/reset_snitch endpoint will also need snitch instance to call reset on. This patch carries local snitch reference all thw way through API setup and patches the get_name() call. The reset_snitch() will come in the next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-20 12:31:45 +03:00
Pavel Emelyanov	c175ea33e2	api: Unset snitch endpoints on stop Some time soon snitch API handlers will operate on local snitch reference capture, so those need to be unset before the target local variable variable goes away Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-20 12:31:12 +03:00
Pavel Emelyanov	ea8bfc4844	storage_service: Keep local snitch reference Storage service uses snitch in several places: - boot - snitch-reconfigured subscription - preferred IP reconnection At this point it's worth adding storage_service->snitch explicit dependency and patch the above to use local reference Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-20 12:30:00 +03:00
Pavel Emelyanov	52d6e56a10	system_keyspace: Don't use global snitch instance There are two places to patch: .start() and .setup() and both only need snitch to get local dc/rack from, nothing more. Thus both can live with the explicit argument for now Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-20 12:29:26 +03:00
Pavel Emelyanov	f524a79fe9	snitch: Add const snitch_ptr::operator->() To call snitch->something() on const snitch_ptr& variable later Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-20 12:29:25 +03:00
Nadav Har'El	264f453b9d	Merge 'Associate alternator user with its service level configuration' from Piotr Sarna Until now, authentication in alternator served only two purposes: - refusing clients without proper credentials - printing user information with logs After this series, this user information is passed to lower layers, which also means that users are capable of attaching service levels to roles, and this service level configuration will be effective with alternator requests. tests: manually by adding more debug logs and inspecting that per-service-level timeout value was properly applied for an authenticated alternator user Fixes #11379 Closes #11380 * github.com:scylladb/scylladb: alternator: propagate authenticated user in client state client_state: add internal constructor with auth_service alternator: pass auth_service and sl_controller to server	2022-10-19 23:27:48 +03:00
Avi Kivity	22f13e7ca3	Revert "Merge 'cql3: select_statement: coroutinize indexed_table_select_statement::do_execute_base_query()' from Avi Kivity" This reverts commit `df8e1da8b2`, reversing changes made to `4ff204c028`. It causes a crash in debug mode on aarch64 (likely a coroutine miscompile). Fixes #11809.	2022-10-19 21:28:55 +03:00
Alexander Turetskiy	636e14cc77	Alternator: Projection field added to return from DescribeTable which describes GSIs and LSIs. The return from DescribeTable which describes GSIs and LSIs is missing the Projection field. We do not yet support all the settings Projection (see #5036), but the default which we support is ALL, and DescribeTable should return that in its description. Fixes #11470 Closes #11693	2022-10-19 19:01:08 +03:00
Avi Kivity	69199dbfba	Merge 'schema_tables: limit concurrency' from Benny Halevy To prevent stalls due to large number of tables. Fixes scylladb/scylladb#11574 Closes #11689 * github.com:scylladb/scylladb: schema_tables: merge_tables_and_views reindent schema_tables: limit paralellism	2022-10-19 18:40:45 +03:00
Tomasz Grabiec	a979bbf829	dbuild: Do not fail if .gdbinit is missing Closes #11811	2022-10-19 18:38:09 +03:00
Avi Kivity	6b0afb968d	Merge 'reader_concurrency_semaphore: add set_resources()' from Botond Dénes Allowing to change the total or initial resources the semaphore has. After calling `set_resources()` the semaphore will look like as if it was created with the specified amount of resources when created. Use the new method in `replica::database::revert_initial_system_read_concurrency_boost()` so it doesn't lead to strange semaphore diagnostics output. Currently the system semaphore has 90/100 count units when there are no reads against it, which has led to some confusion. I also plan on using the new facility in enterprise. Closes #11772 * github.com:scylladb/scylladb: replica/database: revert initial boost to system semaphore with set_resources() reader_concurrency_semaphore: add set_resources()	2022-10-19 18:04:20 +03:00
Raphael S. Carvalho	ba6186a47f	replica: Pick new generation for SSTables being moved from staging dir When moving a SSTable from staging to base dir, we reused the generation under the assumption that no SSTable in base dir uses that same generation. But that's not always true. When reshaping staging dir, reshape compaction can pick a generation taken by a SSTable in base dir. That's because staging dir is populated first and it doesn't have awareness of generations in base dir yet. When that happens, view building will fail to move SSTable in staging which shares the same generation as another in base dir. We could have played with order of population, populating base dir first than staging dir, but the fragility wouldn't be gone. Not future proof at all. We can easily make this safe by picking a new generation for the SSTable being moved from staging, making sure no clash will ever happen. Fixes #11789. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #11790	2022-10-19 15:33:30 +03:00
Nadav Har'El	2e439c9471	test/alternator: add test for issue 11801 This patch adds a test reproducing issue #11801, and confirming that the previous patch fixed it. Before the previous patch, the test passed on DynamoDB but failed on Alternator. The patch also adds four more passing tests which demonstrate that issue #11801 only happened in the very specific case where: 1. A GSI has two key attributes which weren't key attributes in the base, and 2. An update sets the second of those attributes to the same value which it already had. This bug was originally discovered and explained by @fee-mendes. Refs #11801. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-10-19 14:36:48 +03:00
Benny Halevy	4d7f0be929	distributed_loader: populate_column_family: reindent	2022-10-19 14:18:38 +03:00
Benny Halevy	030afaa934	distributed_loader: coroutinize populate_column_family Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-19 14:18:04 +03:00
Benny Halevy	0f23ee14c9	distributed_loader: table_population_metadata: start: reindent	2022-10-19 14:16:59 +03:00
Benny Halevy	39cec4f304	distributed_loader: table_population_metadata: coroutinize start_subdir Calling it in a seastar thread was done to reduce code churn and facilitate backporting. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-19 14:16:59 +03:00
Benny Halevy	5749a54cab	distributed_loader: table_population_metadata: start_subdir: reindent	2022-10-19 14:16:59 +03:00
Benny Halevy	119c0f3983	distributed_loader: pre-load all sstables metadata for table before populating it We should scan all sstables in the table directory and its subdirectories to determine the highest sstable version and generation before using it for creating new sstables (via reshard or reshape). Fixes scylladb/scylladb#11793 Note: table_population_metadata::start_subdir is called in a seastar thread to facilitate backporting to old versions that do not support coroutines yet. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-19 14:16:57 +03:00
Nadav Har'El	8f4243b875	MV: fix handling of view update which reassign the same key value When a materialized view has a key (in Alternator, this can be two keys) which was a regular column in the base table, and a base update modifies that regular column, there are two distinct cases: 1. If the old and new key values are different, we need to delete the old view row, and create a new view row (with the different key). 2. If the old and new key values are the same, we just need to update the pre-existing row. It's important not to confuse the two cases: If we try to delete and create the same view row in the same timestamp, the result will be that the row will be deleted (a tombstone wins over data if they have the same timestamp) instead of updated. This is what we saw in issue #11801. We had a bug that was seen when an update set the view key column to the old value it already had: To compare the old and new key values we used the function compare_atomic_cell_for_merge(), but this compared not just they values but also incorrectly compared the metadata such as a the timestamp. Because setting a column to the same value changes its timestamp, we wrongly concluded that these to be different view keys and used the delete-and-create code for this case, resulting in the view row being deleted (as explained above). The simple fix is to compare just the key values - not looking at the metadata. See tests reproducing this bug and confirming its fix in the next patch. Fixes #11801 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-10-19 13:43:12 +03:00
Nadav Har'El	e1f8cb6521	materialized views: inline used-once and confusing function, replace_entry() The replace_entry() function is nothing more than a convenience for calling delete_old_entry() and then create_entry(). But it is only used once in the code, and we can just open-code the two calls instead of the one. The reason I want to change it now is that the shortcut replace_entry() helped hide a bug (#11801) - replace_entry() works incorrectly if the old and new row have the same key, because if they do we get a deletion and creation of the same row with the same timestamp - and the deletion wins. Having the two calls not hidden by a convenience function makes this potential problem more apparent. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-10-19 13:25:34 +03:00
Benny Halevy	ce22dd4329	schema_tables: merge_tables_and_views reindent Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-19 13:05:41 +03:00
Benny Halevy	7ccb0e70f0	schema_tables: limit paralellism To prevent stalls due to large number of tables. Fixes scylladb/scylladb#11574 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-19 13:05:38 +03:00
Anna Stuchlik	7ec750fc63	docs: add the list of new metrics in 5.1 Closes #11703	2022-10-19 12:06:25 +03:00
Jan Ciolek	1b7acc758e	cql3: expr: Add tests for evaluation of column_values and bind_variables Add tests which test that evaluate(column_value) and evaluate(bind_variable) work as expected. values of columns and bind variables are kept in evaluation_inputs, so we need to mock them in order for evaluate() to work. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-10-19 10:30:51 +02:00
Jan Ciolek	0f29015d9f	cql3: expr: Add constant evaluation tests Add unit test for evaluating expr::constant values. evaluate(constant) just returns constant.value, so there is no point in trying all the possible combinations. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-10-19 10:30:42 +02:00
Anna Stuchlik	a066396cd3	doc: fix the command to create and sign a certificate so that the trusted certificate SHA256 is created Closes #11758	2022-10-19 11:30:20 +03:00
Botond Dénes	37ebbc819a	Merge 'Scylla-gdb lsa polishing' from Pavel Emelyanov It was supposed to be fix for #2455, but eventually it turned out that #11792 blocks this progress but takes more efforts. So for now only a couple of small improvements (not to lose them by chance) Closes #11794 * github.com:scylladb/scylladb: scylla-gdb: Make regions iterable object scylla-gdb: Dont print 0x0x	2022-10-19 06:54:49 +03:00
Botond Dénes	2d581e9e8f	Merge "Maintain dc/rack by topology" from Pavel Emelyanov " There's an ongoing effort to move the endpoint -> {dc/rack} mappings from snitch onto topology object and this set finalizes it. After it the snitch service stops depending on gossiper and system keyspace and is ready for de-globalization. As a nice side-effect the system keyspace no longer needs to maintain the dc/rack info cache and its starting code gets relaxed. refs: #2737 refs: #2795 " * 'br-snitch-dont-mess-with-topology-data-2' of https://github.com/xemul/scylla: (23 commits) system_keyspace: Dont maintain dc/rack cache system_keyspace: Indentation fix after previous patch system_keyspace: Coroutinuze build_dc_rack_info() topology: Move all post-configuration to topology::config snitch: Start early gossiper: Do not export system keyspace snitch: Remove gossiper reference snitch: Mark get_datacenter/_rack methods const snitch: Drop some dead dependency knots snitch, code: Make get_datacenter() report local dc only snitch, code: Make get_rack() report local rack only storage_service: Populate pending endpoint in on_alive() code: Populate pending locations topology: Put local dc/rack on topology early topology: Add pending locations collection topology: Make get_location() errors more verbose token_metadata: Add config, spread everywhere token_metadata: Hide token_metadata_impl copy constructor gosspier: Remove messaging service getter snitch: Get local address to gossip via config ...	2022-10-19 06:50:21 +03:00
Jan Ciolek	429600a957	test/boost: Add expr_test_utils.hh Add a header file which will contain utilities for writing expression tests. For now it contains simple functions like make_int_constant(), but there are many more to come. I feel like it's cleaner to put all these functions in a separate file instead of having them spread randomly between tests. It also enables code reuse so that future expression tests can reuse these functions instead of writing them from scratch. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-10-18 22:48:33 +02:00
Jan Ciolek	855db49306	cql3: Add ostream operator for raw_value It's possible to print raw_value_view, but not raw_value. It would be useful to be able to print both. Implement printing raw_value by creating raw_value_view from it and printing the view. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-10-18 22:48:25 +02:00
Jan Ciolek	096c65d27f	cql3: add is_empty_value() to raw_value and raw_value_view An empty value is a value that is neither null nor unset, but has 0 bytes of data. Such values can be created by the user using certain CQL functions, for example an empty int value can be inserted using blobasint(0x). Add a method to raw_value and raw_value_view, which allows to check whether the value is empty. This will be used in many places in which we need to validate that a value isn't empty. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-10-18 22:47:48 +02:00
Pavel Emelyanov	3dc7c33847	repair: Remove ops_uuid It used to be used to abort repair_info by the corresponding node-ops uuid, but this code is no longer there, so it's good to drop the uuid as well Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:04:23 +03:00
Pavel Emelyanov	b835c3573c	repair: Remove abort_repair_node_ops() altogether This code is dead after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:04:23 +03:00
Pavel Emelyanov	8231b4ec1b	repair: Subscribe on node_ops_info::as abortion When node_ops_meta_data aborts it also kicks repair to find and abort all relevant repair_infos. Now it can be simplified by subscribing repair_meta on the abort source and aborting it without explicit kick Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:04:23 +03:00
Pavel Emelyanov	bf5825daac	repair: Keep abort source on node_ops_info Next patches will need to subscribe on node_ops_meta_data's abort source inside repair code, so keep the pointer on node_ops_info too. At the same time, the node_ops_info::abort becomes obsolete, because the same check can be performed via the abort_source->abort_requested() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:04:23 +03:00
Pavel Emelyanov	bbb7fca09c	repair: Pass node_ops_info arg to do_sync_data_using_repair() Next patches will need to know more than the ops_uuid. The needed info is (well -- will be) sitting on node_ops_info, so pass it along Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:04:23 +03:00
Pavel Emelyanov	5e9c3c65b5	repair: Mark repair_info::abort() noexcept Next patch will call it inside abort_source subscription callback which requires the calling code to be noexcept Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:04:23 +03:00
Pavel Emelyanov	34458ec2c5	node_ops: Remove _aborted bit A short cleanup "while at it" -- the node_ops_meta_data doesn't need to carry dedicated _aborted boolean -- the abort source that sets it is available instantly Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:04:22 +03:00
Pavel Emelyanov	96f0695731	node_ops: Simplify construction of node_ops_metadata It always constructs node_ops_info the same way Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:03:53 +03:00
Pavel Emelyanov	2fa58632b3	main: Fix message about repair service starting Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 17:23:17 +03:00
Botond Dénes	7fbad8de87	reader_concurrency_semaphore: unify admission logic across all paths The semaphore currently has two admission paths: the obtain_permit()/with_permit() methods which admits permits on user request (the front door) and the maybe_admit_waiters() which admits permits based on internal events like memory resource being returned (the back door). The two paths used their own admission conditions and naturally this means that they diverged in time. Notably, maybe_admit_waiters() did not look at inactive readers assuming that if there are waiters there cannot be inactive readers. This is not true however since we merged the execution-stage into the semaphore. Waiters can queue up even when there are inactive reads and thus maybe_admit_waiters() has to consider evicting some of them to see if this would allow for admitting new reads. To avoid such divergence in the future, the admission logic was moved into a new method can_admit_read() which is now shared between the two method families. This method now checks for the possibility of evicting inactive readers as well. The admission logic was tuned slightly to only consider evicting inactive readers if there is a real possibility that this will result in admissions: notably, before this patch, resource availability was checked before stalls were (used permits == blocked permits), so we could evict readers even if this couldn't help. Because now eviction can be started from maybe_admit_waiters(), which is also downstream from eviction, we added a flag to avoid recursive evict -> maybe admit -> evict ... loops. Fixes: #11770 Closes #11784	2022-10-18 17:07:43 +03:00
Pavel Emelyanov	b5fd65af61	scylla-gdb: Make regions iterable object This makes it re-usable across different commands (not there yet) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 16:09:47 +03:00
Pavel Emelyanov	0b6b0bd8d2	scylla-gdb: Dont print 0x0x Formatting pointer adds 0x automatically, no need in adding it explicitly Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 16:09:09 +03:00
Botond Dénes	df8e1da8b2	Merge 'cql3: select_statement: coroutinize indexed_table_select_statement::do_execute_base_query()' from Avi Kivity indexed_table_select_statement::do_execute_base_query() is fairly complicated and becomes a little simpler with coroutines. Closes #11297 * github.com:scylladb/scylladb: cql3: indexed_table_select_statement: fix indentation cql3: indexed_table_select_statement: clarify loop termination cql3: indexed_table_select_statement: get rid of internal base_query_state struct cql3: indexed_table_select_statement: coroutinize do_execute_base_query() cql3: indexed_table_select_statement: de-result_wrap() do_execute_base_query()	2022-10-18 08:24:21 +03:00
Avi Kivity	50b1fd4cd2	cql3: indexed_table_select_statement: fix indentation Restore normal indentation after coroutinization, no code changes.	2022-10-17 22:03:11 +03:00
Avi Kivity	3ad956ca2d	cql3: indexed_table_select_statement: clarify loop termination The loop terminates when we run out of keys. There are extra conditions such as for short read or page limit, but these are truly discovered during the loop and qualify as special conditions, if you squint enough.	2022-10-17 22:03:11 +03:00
Avi Kivity	ec183d4673	cql3: indexed_table_select_statement: get rid of internal base_query_state struct It was just a crutch for do_with(), and now can be replaced with ordinary coroutine-protected variables. The member names were renamed to the final names they were assigned within the do_with().	2022-10-17 22:03:11 +03:00
Avi Kivity	75e1321b08	cql3: indexed_table_select_statement: coroutinize do_execute_base_query() Indentation and "infinite" for-loop left for later cleanup. Note the last check for a utils::result<> failure is no longer needed, since the previous checks for failure resulted in an immediate co_return rather than propagating the failure into a variable as with continuations. The lambda coroutine is stabilized with the new seastar::coroutine::lambda facility.	2022-10-17 22:03:11 +03:00
Avi Kivity	8b019841d8	cql3: indexed_table_select_statement: de-result_wrap() do_execute_base_query() It's an obstacle to coroutinization as it introduces more lambdas.	2022-10-17 22:03:11 +03:00
Tomasz Grabiec	4ff204c028	Merge 'cache: make all removals of cache items explicit' from Michał Chojnowski This series is a step towards non-LRU cache algorithms. Our cache items are able to unlink themselves from the LRU list. (In other words, they can be unlinked solely via a pointer to the item, without access to the containing list head). Some places in the code make use of that, e.g. by relying on auto-unlink of items in their destructor. However, to implement algorithms smarter than LRU, we might want to update some cache-wide metadata on item removal. But any cache-wide structures are unreachable through an item pointer, since items only have access to themselves and their immediate neighbours. Therefore, we don't want items to unlink themselves — we want `cache.remove(item)`, rather than `item.remove_self()`, because the former can update the metadata in `cache`. This series inserts explicit item unlink calls in places that were previously relying on destructors, gets rid of other self-unlinks, and adds an assert which ensures that every item is explicitly unlinked before destruction. Closes #11716 * github.com:scylladb/scylladb: utils: lru: assert that evictables are unlinked before destruction utils: lru: remove unlink_from_lru() cache: make all cache unlinks explicit	2022-10-17 12:47:02 +02:00
Michał Chojnowski	a96433d3a4	utils: lru: assert that evictables are unlinked before destruction Previous patches introduce the assumption that evictables are manually unlinked before destruction, to allow for correct bookkeeping within the cache. This assert assures that this assumptions is correct. This is particularly important because the switch from automatic to explicit unlinking had to be done manually. Destructor calls are invisible, so it's possible that we have missed some automatic destruction site.	2022-10-17 12:07:27 +02:00
Michał Chojnowski	f340c9cca5	utils: lru: remove unlink_from_lru() unlink_from_lru() allows for unlinking elements from cache without notifying the cache. This messes up any potential cache bookkeeping. Improved that by replacing all uses of unlink_from_lru() with calls to lru::remove(), which does have access to cache's metadata.	2022-10-17 12:07:27 +02:00
Michał Chojnowski	d785364375	cache: make all cache unlinks explicit Our LSA cache is implemented as an auto_unlink Boost intrusive list, meaning that elements of the list unlink themselves from the list automatically on destruction. Some parts of the code rely on that, and don't unlink them manually. However, this precludes accurate bookkeeping about the cache. Elements only have access to themselves and their neighbours, not to any bookkeeping context. Therefore, a destructor cannot update the relevant metadata. In this patch, we fix this by adding explicit unlink calls to places where it would be done by a destructor. In a following patch, we will add an assert to the destructor to check that every element is unlinked before destruction.	2022-10-17 12:07:27 +02:00
Nadav Har'El	c31bf4184f	test/cql-pytest: two reproducers for SI returning oversized pages This patch has two reproducing tests for issue #7432, which are cases where a paged query with a restriction backed by a secondary-index returns pages larger than the desired page size. Because these tests reproduce a still-open bug, they are both marked "xfail". Both tests pass on Cassandra. The two tests involve quite dissimilar casess - one involves requesting an entire partition (and Scylla forgetting to page through it), and the other involves GROUP BY - so I am not sure these two bugs even have the same underlying cause. But they were both reported in #7432, so let's have reproducers for both. Refs #7432 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11586	2022-10-17 11:36:05 +03:00
Botond Dénes	d85208a574	replica/database: revert initial boost to system semaphore with set_resources() Unlike the current method (which uses consume()), this will also adjust the initial resources, adjusting the semaphore as if it was created with the reduced amount of resources in the first place. This fixes the confusing 90/100 count resources seen in diagnostics dump outputs.	2022-10-17 07:39:20 +03:00
Botond Dénes	ecc7c72acd	reader_concurrency_semaphore: add set_resources() Allowing to change the total or initial resources the semaphore has. After calling `set_resources()` the semaphore will look like as if it was created with the specified amount of resources when created.	2022-10-17 07:39:20 +03:00
Avi Kivity	e5e7780f32	test: work around modern pytest rejecting site-packages Modern (as of Fedora 37) pytest has the "-sP" flags in the Python command line, as found in /usr/bin/pytest. This means it will reject the site-packages directory, where we install the Scylla Python driver. This causes all the tests to fail. Work around it by supplying an alternative pytest script that does not have this change. Closes #11764	2022-10-17 07:18:33 +03:00
Nadav Har'El	9f02431064	test/cql-pytest: fix test_permissions.py when running with "--ssl" The tests in test_permissions.py use the new_session() utility function to create a new connection with a different logged-in user. It models the new connection on the existing one, but incorrectly assumed that the connection is NOT ssl. This made this test failed with cql-pytest/run is passed the "--ssl" option. In this patch we correctly infer the is_ssl state from the existing cql fixture, instead of assuming it is false. After this pass, "cql-pytest/run --ssl" works as expected for this test. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11742	2022-10-17 06:46:46 +03:00
Tomasz Grabiec	c8a372ae7f	test: db: Add test for row merging involving many versions The test verifies that a row which participated in earlier merge, and its cells lost on the timestamp check, behaves exactly like an empty row and can accept any mutation. This wasn't the case in versions prior to `f006acc`. Closes #11787	2022-10-16 14:29:49 +03:00
Tomasz Grabiec	5d7e40af99	mvcc: Add snapshot details to the printout of partition_entry Useful for debugging. Closes #11788	2022-10-16 14:22:14 +03:00
Nadav Har'El	d2cd9b71b3	Merge 'Make tracing test run again, simplify backend registry and few related cleanups' from Pavel Emelyanov It turned out that boost/tracing test is not run because its name doesn't match the _test.cc pattern. While fixing it it turned out that the test cannot even start, because it uses future<>.get() calls outside of seastar::thread context. While patching this place the trace-backend registry was removed for simplicity. And, while at it, few more cleanups "while at it" Closes #11779 github.com:scylladb/scylladb: tracing: Wire tracing test back tracing: Indentation fix after previous patch tracing: Move test into thread tracing: Dismantle trace-backend registry tracing: Use class-registrator for backends tracing: Add constraint to trace_state::begin() tracing: Remove copy-n-paste comments from test tracing: Outline may_create_new_session	2022-10-16 12:32:17 +03:00
Nadav Har'El	1f936838ba	Merge 'doc: fix the notes on the OS Support by Platform and Version page' from Anna Stuchlik Fix https://github.com/scylladb/scylladb/issues/11773 This PR fixes the notes by removing repetition and improving the clairy of the notes on the OS Support page. In addition, "Scylla" was replaced with "ScyllaDB" on related pages. Closes #11783 * github.com:scylladb/scylladb: doc: replace Scylla with ScyllaDB doc: add a comment to remove in future versions any information that refers to previous releases doc: rewrite the notes to improve clarity doc: remove the reperitions from the notes	2022-10-16 10:13:50 +03:00
Tomasz Grabiec	87b7e7ff9c	Merge 'storage_proxy: prepare for fencing, complex ops' from Avi Kivity Following up on `69aea59d97`, which added fencing support for simple reads and writes, this series does the same for the complex ops: - partition scan - counter mutation - paxos With this done, the coordinator knows about all in-flight requests and can delay topology changes until they are retired. Closes #11296 * github.com:scylladb/scylladb: storage_proxy: hold effective_replication_map for the duration of a paxos transaction storage_proxy: move paxos_response_handler class to .cc file storage_proxy: deinline paxos_response_handler constructor/destructor storage_proxy: use consistent effective_replication_map for counter coordinator storage_proxy: improve consistency in query_partition_key_range{,_concurrent} storage_proxy: query_partition_key_range_concurrent: reduce smart pointer use storage_proxy: query_partition_key_range_concurrent: improve token_metadata consistency storage_proxy: query_singular: use fewer smart pointers storage_proxy: query_singular: simplify lambda captures locator: effective_replication_map: provide non-smart-pointer accessor to token_metadata storage_proxy: use consistent token_metadata with rest of singular read	2022-10-14 15:44:35 +02:00
Pavel Emelyanov	6150214da3	Add rust/Cargo.lock to .gitignore The file appears after build Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11776	2022-10-14 13:54:50 +03:00
Anna Stuchlik	09b0e3f63e	doc: replace Scylla with ScyllaDB	2022-10-14 11:06:27 +02:00
Anna Stuchlik	9e2b7e81d3	doc: add a comment to remove in future versions any information that refers to previous releases	2022-10-14 10:53:17 +02:00
Anna Stuchlik	fc0308fe30	doc: rewrite the notes to improve clarity	2022-10-14 10:48:59 +02:00
Anna Stuchlik	1bd0bc00b3	doc: remove the reperitions from the notes	2022-10-14 10:32:52 +02:00
Botond Dénes	621e43a0c8	Merge 'dirty_memory_manager: tidy up' from Avi Kivity A collection of small cleanups, and a bug fix. Closes #11750 * github.com:scylladb/scylladb: dirty_memory_manager: move region_group data members to top-of-class dirty_memory_manager: update region_group comment dirty_memory_manager: remove outdated friend dirty_memory_manager: fold region_group::push_back() into its caller dirty_memory_manager: simplify blocked calculation in region_group::run_when_memory_available dirty_memory_manager: remove unneeded local from region_group::run_when_memory_is_available dirty_memory_manager: tidy up region_group::execution_permitted() dirty_memory_manager: reindent region_group::release_queued_allocations() dirty_memory_manager: convert region_group::release_queued_allocations() to a coroutine dirty_memory_manager: move region_group::_releaser after _shutdown_requested dirty_memory_manager: move region_group queued allocation releasing into a function dirty_memory_manager: fold allocation_queue into region_group dirty_memory_manager: don't ignore timeout in allocation_queue::push_back()	2022-10-14 06:56:42 +03:00
Avi Kivity	1feaa2dfb4	storage_proxy: handle_write: use coroutine::all() instead of when_all() coroutine::all() saves an allocation. Since it's safe for lambda coroutines, remove a coroutine::lambda wrapper. Closes #11749	2022-10-14 06:56:16 +03:00
Tomasz Grabiec	ee2398960c	Merge 'service/raft: simplify `raft_address_map`' from Kamil Braun The `raft_address_map` code was "clever": it used two intrusive data structures and did a lot of manual lifetime management; raw pointer manipulation, manual deletion of objects... It wasn't clear who owns which object, who is responsible for deleting what. And there was a lot of code. In this PR we replace one of the intrusive data structures with a good old `std::unordered_map` and make ownership clear by replacing the raw pointers with `std::unique_ptr`. Furthermore, some invariants which were not clear and enforced in runtime are now encoded in the type system. The code also became shorter: we reduced its length from ~360 LOC to ~260 LOC. Closes #11763 * github.com:scylladb/scylladb: service/raft: raft_address_map: get rid of `is_linked` checks service/raft: raft_address_map: get rid of `to_list_iterator` service/raft: raft_address_map: simplify ownership of `expiring_entry_ptr` service/raft: raft_address_map: move _last_accessed field from timestamped_entry to expiring_entry_ptr service/raft: raft_address_map: don't use intrusive set for timestamped entries service/raft: raft_address_map: store reference to `timestamped_entry` in `expiring_entry_ptr`	2022-10-13 18:08:49 +02:00
Kamil Braun	954849799d	test/topology: disable flaky `test_decommission_add_column` Flaky due to #11780, causes next promotion failures. We can reenable it after the issue is fixed or a workaround is found.	2022-10-13 17:45:46 +02:00
Pavel Emelyanov	707efb6dfb	tracing: Wire tracing test back The boost/tracing test is not run, because test.py boost suite collects tests that match *_test.cc pattern. The tracing one apparently doesn't Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-13 17:59:13 +03:00
Pavel Emelyanov	5b67a2a876	tracing: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-13 17:59:08 +03:00
Pavel Emelyanov	53ac8536f1	tracing: Move test into thread The test calls future<>.get()'s in its lambda which is only allowed in seastar threads. It's not stepped upon because (surprise, surprise) this test is not run at all. Next patch fixes it. Meanwhile, the fix is in using cql_env_thread thing for the whole lambda which runs in it seastar::async() context Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-13 17:57:35 +03:00
Pavel Emelyanov	5c8a61ace2	tracing: Dismantle trace-backend registry It's not used any longer Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-13 17:57:24 +03:00
Pavel Emelyanov	fe7d38661c	tracing: Use class-registrator for backends Currently the code uses its own class registration engine, but there's a generic one in utils/ that applies here too. In fact, the tracing backend registry is just a transparent wrapper over the generic one :\ Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-13 17:56:24 +03:00
Pavel Emelyanov	1adb2c8cc3	tracing: Add constraint to trace_state::begin() It expects that the function is (void) and returns back a string Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-13 17:56:08 +03:00
Pavel Emelyanov	0a6a5a242e	tracing: Remove copy-n-paste comments from test Tests don't have supervisor, so there's no sense in keeping these bits Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-13 17:55:40 +03:00
Pavel Emelyanov	79820c2006	tracing: Outline may_create_new_session It's a private method used purely in tracing.cc, no need in compiling it every time the header is met somewhere else. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-13 17:55:14 +03:00
Anna Stuchlik	9f7536d549	doc: fix the link to the OS Support page	2022-10-13 15:36:51 +02:00
Anna Stuchlik	1fd1ce042a	doc: replace Scylla with ScyllaDB	2022-10-13 15:21:46 +02:00
Anna Stuchlik	81ce7a88de	doc: update the info about supported architecture and rewrite the introduction	2022-10-13 15:18:29 +02:00
Kamil Braun	5a9371bcb0	service/raft: raft_address_map: get rid of `is_linked` checks Being linked is an invariant of `expiring_entry_ptr`. Make it explicit by moving the `_expiring_list.push_front` call into the constructor.	2022-10-13 15:17:07 +02:00
Kamil Braun	cdf3367c05	service/raft: raft_address_map: get rid of `to_list_iterator` Unnecessary.	2022-10-13 15:17:06 +02:00
Kamil Braun	0e29495c38	service/raft: raft_address_map: simplify ownership of `expiring_entry_ptr` The owner of `expiring_entry_ptr` was almost uniquely its corresponding `timestamp_entry`; it would delete the expiring entry when it itself got destroyed. There was one call to explicit `unlink_and_dispose`, which made the picture unclear. Make the picture clear: `timestamped_entry` now contains a `unique_ptr` to its `expiring_entry_ptr`. The `unlink_and_dispose` was replaced with `_lru_entry = nullptr`. We can also get rid of the back-reference from `expiring_entry_ptr` to `timestamped_entry`. The code becomes shorter and simpler.	2022-10-13 15:16:40 +02:00
Petr Gusev	c76cf5956d	removenode: don't stream data from the leaving node If a removenode is run for a recently stopped node, the gossiper may not yet know that the node is down, and the removenode will fail with a Stream failed error trying to stream data from that node. In this patch we explicitly reject removenode operation if the gossiper considers the leaving node up. Closes #11704	2022-10-13 15:11:32 +02:00
Takuya ASADA	49d5e51d76	reloc: add support stripped binary installation for relocatable package This add support stripped binary installation for relocatable package. After this change, scylla and unified packages only contain stripped binary, and introduce "scylla-debuginfo" package for debug symbol. On scylla-debuginfo package, install.sh script will extract debug symbol at /opt/scylladb/<dir>/.debug. Note that we need to keep unstripped version of relocatable package for rpm/deb, otherwise rpmbuild/debuild fails to create debug symbol package. This version is renamed to scylla-unstripped-$version-$release.$arch.tar.gz. See #8918 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Closes #9005	2022-10-13 15:11:32 +02:00
Asias He	6134fe4d1f	storage_service: Prevent removed node to rejoin in handle_state_normal - Start n1, n2, n3 (127.0.0.3) - Stop n3 - Change ip address of n3 to 127.0.0.33 and restart n3 - Decommission n3 - Start new node n4 The node n4 will learn from the gossip entry for 127.0.0.3 that node 127.0.0.3 is in shutdown status which means 127.0.0.3 is still part of the ring. This patch prevents this by checking the status for the host id on all the entries. If any of the entries shows the node with the host id is in LEFT status, reject to put the node in NORMAL status. Fixes #11355 Closes #11361	2022-10-13 15:11:32 +02:00
Jan Ciolek	52bbc1065c	cql3: allow lists of IN elements to be NULL Requests like `col IN NULL` used to cause an error - Invalid null value for colum col. We would like to allow NULLs everywhere. When a NULL occurs on either side of a binary operator, the whole operation should just evaluate to NULL. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> Closes #11775	2022-10-13 15:11:32 +02:00
Avi Kivity	19e62d4704	commitlog: delete unused "num_deleted" variable Since `d478896d46` we update the variable, but never read it. Clang 15 notices and complains. Remove the variable to make it happy. Closes #11765	2022-10-13 15:11:32 +02:00
Avi Kivity	a2da08f9f9	storage_proxy: hold effective_replication_map for the duration of a paxos transaction Luckily, all topology calculations are done in get_paxos_participants(), so all we have to do is it hold the effective_replication_map for the duration of the transaction, and pass it to get_paxos_participants(). This ensures that the coordinator knows about all in-flight requests and can fence them from topology changes.	2022-10-13 14:27:26 +03:00
Avi Kivity	69aaa5e131	storage_proxy: move paxos_response_handler class to .cc file It's not used elsewhere.	2022-10-13 14:27:26 +03:00
Avi Kivity	b2f3934e95	storage_proxy: deinline paxos_response_handler constructor/destructor They have no business being inline as it's a heavyweight object.	2022-10-13 14:27:26 +03:00
Avi Kivity	94e4ff11be	storage_proxy: use consistent effective_replication_map for counter coordinator Hold the effective_replication_map while talking to the counter leader, to allow for fencing in the future. The code is somewhat awkward because the API allows for multiple keyspaces to be in use. The error code generation, already broken as it doesn't use the correct table, continues to be broken in that it doesn't use the correct effective_replication_map, for the same reason.	2022-10-13 14:27:23 +03:00
Avi Kivity	406a046974	storage_proxy: improve consistency in query_partition_key_range{,_concurrent} query_partition_key_range captures a token_metadata_ptr and uses it consistently in sequential calls to query_partition_key_range_concurrent (via tail recursion), but each invocation of query_partition_key_range_concurrent captures its own effective_replication_map_ptr. Since these are captured at different times, they can be inconsistent after the first iteration. Fix by capturing it once in the caller and propagating it everywhere.	2022-10-13 13:56:52 +03:00
Avi Kivity	5d320e95d5	storage_proxy: query_partition_key_range_concurrent: reduce smart pointer use Capture token_metadata by reference rather than smart pointer, since out effective_replication_map_ptr protects it.	2022-10-13 13:56:52 +03:00
Avi Kivity	f75efa965f	storage_proxy: query_partition_key_range_concurrent: improve token_metadata consistency Derive the token_metadata from the effective_replication_map rather than getting it independently. Not a real bug since these were in the same continuation, but safer this way.	2022-10-13 13:56:52 +03:00
Avi Kivity	161ce4b34f	storage_proxy: query_singular: use fewer smart pointers Capture token_metadata by reference since we're protecting it with the mighty effective_replication_map_ptr. This saves a few instructions to manage smart pointers.	2022-10-13 13:56:33 +03:00
Avi Kivity	efd89c1890	storage_proxy: query_singular: simplify lambda captures The lambdas in query_singular do not outlive the enclosing coroutine, so they can capture everything by reference. This simplifies life for a future update of the lambda, since there's one thing less to worry about.	2022-10-13 13:52:54 +03:00
Avi Kivity	d9955ab35b	locator: effective_replication_map: provide non-smart-pointer accessor to token_metadata token_metadata is protected by holders of an effective_replication_map_ptr, so it's just as safe and less expensive for them to obtain a reference to token_metadata rather than a smart pointer, so give them that option with a new accessor.	2022-10-13 13:46:04 +03:00
Avi Kivity	86a48cf12f	storage_proxy: use consistent token_metadata with rest of singular read query_singular() uses get_token_metadata_ptr() and later, in get_read_executor(), captures the effective_replication_map(). This isn't a bug, since the two are captured in the same continuation and are therefore consistent, but a way to ensure it stays so is to capture the effective_replication_map earlier and derive the token_metadata from it.	2022-10-13 13:46:04 +03:00
Avi Kivity	720fc733f0	dirty_memory_manager: move region_group data members to top-of-class Rather than have them spread out throughout the class.	2022-10-13 13:12:01 +03:00
Avi Kivity	61b780ae63	dirty_memory_manager: update region_group comment It's still named region_group. I may merge the whole thing into dirty_memory_manager to retire the name.	2022-10-13 13:09:01 +03:00
Avi Kivity	7a5fa1497c	dirty_memory_manager: remove outdated friend That friend no longer exists.	2022-10-13 13:03:43 +03:00
Avi Kivity	02b7697051	dirty_memory_manager: fold region_group::push_back() into its caller It is too trivial to live.	2022-10-13 13:03:43 +03:00
Avi Kivity	d403ecbed9	dirty_memory_manager: simplify blocked calculation in region_group::run_when_memory_available - apply De Morgan's law - merge if block into boolean calculation	2022-10-13 13:03:43 +03:00
Avi Kivity	cb6c7023c1	dirty_memory_manager: remove unneeded local from region_group::run_when_memory_is_available	2022-10-13 13:03:43 +03:00
Avi Kivity	39668d5ae2	dirty_memory_manager: tidy up region_group::execution_permitted() - remove excess parentheses - apply De Morgan's law - remove unneeded this-> - whitespace cleanups	2022-10-13 13:03:43 +03:00
Avi Kivity	02706e78f9	dirty_memory_manager: reindent region_group::release_queued_allocations()	2022-10-13 13:03:43 +03:00
Avi Kivity	128f1c8c21	dirty_memory_manager: convert region_group::release_queued_allocations() to a coroutine Nicer and faster. We have a rare case where we hold a lock for the duration of a call but we don't want to hold it until the future it returns is resolved, so we have to resort to a minor trick.	2022-10-13 13:03:29 +03:00
Avi Kivity	aad4c1c5e9	dirty_memory_manager: move region_group::_releaser after _shutdown_requested The function that is attached to _releaser depends on _shutdown_requested. There is currently now use-before-init, since the function (release_queued_allocations) starts with a yield(), moving the first use to until after the initialization. Since I want to get rid of the yield, reorder the fields so that they are initialized in the right order.	2022-10-13 13:00:50 +03:00
Raphael S. Carvalho	ec79ac46c9	db/view: Add visibility to view updating of Staging SSTables Today, we're completely blind about the progress of view updating on Staging files. We don't know how long it will take, nor how much progress we've made. This patch adds visibility with a new metric that will inform the number of bytes to be processed from Staging files. Before any work is done, the metric tell us the total size to be processed. As view updating progresses, the metric value is expected to decrease, unless work is being produced faster than we can consume them. We're piggybacking on sstables::read_monitor, which allows the progress metric to be updated whenever the SSTable reader makes progress. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #11751	2022-10-12 16:57:37 +03:00
Avi Kivity	2e79bb431c	tools: change source_location location std::experimental::source_location is provided by <experimental/source_location>, not <source_location>. libstdc++ 12 insists, so change the header. Closes #11766	2022-10-12 15:29:14 +03:00
Takuya ASADA	6b246dc119	locator::ec2_snitch: Retry HTTP request to EC2 instance metadata service EC2 instance metadata service can be busy, ret's retry to connect with interval, just like we do in scylla-machine-image. Fixes #10250 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Closes #11688	2022-10-12 13:59:06 +03:00
Kamil Braun	92dd1f7307	service/raft: raft_address_map: move _last_accessed field from timestamped_entry to expiring_entry_ptr `timestamped_entry` had two fields: ``` optional<clock_time_point> _last_accessed expiring_entry_ptr* _lru_entry ``` The `raft_address_map` data structure maintained an invariant: `_last_accessed` is set if and only if `_lru_entry` is not null. This invariant could be broken for a while when constructing an expiring `timestamped_entry`: the constructor was given an `expiring = true` flag, which set the `_last_accessed` field; this was redundant, because immediately after a corresponding `expiring_entry_ptr` was constructed which again reset the `_last_accessed` field and set `_lru_entry`. The code becomes simpler and shorter when we move `_last_accessed` field into `expiring_entry_ptr`. The invariant is now guaranteed by the type system: `_last_accessed` is no longer `optional`.	2022-10-12 12:22:57 +02:00
Kamil Braun	262b9473d5	service/raft: raft_address_map: don't use intrusive set for timestamped entries Intrusive data structures are harder to reason about. In `raft_address_map` there's a good reason to use an intrusive list for storing `expiring_entry_ptr`s: we move the entries around in the list (when their expiration times change) but we want for the objects to stay in place because `timestamped_entry`s may point to them (although we could simply update the pointers using the existing back-reference...) However, there's not much reason to store `timestamped_entry` in an intrusive set. It was basically used in one place: when dropping expired entries, we iterate over the list of `expiring_entry_ptr`s and we want to drop the corresponding `timestamped_entry` as well, which is easy when we have a pointer to the entry and it's a member of an intrusive container. But we can deal with it when using non-intrusive containers: just `find` the element in the container to erase it. The code becomes shorter with this change. I also use a map instead of a set because we need to modify the `timestamped_entry` which wouldn't be possible if it was used as an `unordered_set` key. In fact using map here makes more sense: we were using the intrusive set similarly to a map anyway because all lookups were performed using the `_id` field of `timestamped_entry` (now the field was moved outside the struct, it's used as the map's key).	2022-10-12 12:22:50 +02:00
Kamil Braun	3e84b1f69c	Merge 'test.py: topology fix ssl var and improve pylint score' from Alecco When code was moved to the new directory, a bug was reintroduced with `ssl` local hiding `ssl` module. Fix again. Closes #11755 * github.com:scylladb/scylladb: test.py: improve pylint score for conftest test.py: fix variable name collision with ssl	2022-10-12 11:41:11 +02:00
Avi Kivity	f673d0abbe	build: support fmt 9 ostream formatter deprecation fmt 9 deprecates automatic fallback to std::ostream formatting. We should migrate, but in order to do so incrementally, first enable the deprecated fallback so the code continues to compile. Closes #11768	2022-10-12 09:27:36 +03:00
Avi Kivity	0952cecfc9	build: mark abseil as a system header Abseil is not under our control, so if a header generates a warning, we can do nothing about it. So far this wasn't a problem, but under clang 15 it spews a harmless deprecation warning. Silence the warning by treating the header as a system header (which it is, for us). Closes #11767	2022-10-12 09:27:36 +03:00
Kamil Braun	0c13c85752	service/raft: raft_address_map: store reference to `timestamped_entry` in `expiring_entry_ptr` The class was storing a pointer which couldn't be null. A reference is a better fit in this case.	2022-10-11 17:21:01 +02:00
Asias He	810b424a8c	storage_service: Reject to bootstrap new node when node has unknown gossip status - Start a cluster with n1, n2, n3 - Full cluster shutdown n1, n2, n3 - Start n1, n2 and keep n3 as shutdown - Add n4 Node n4 will learn the ip and uuid of n3 but it does not know the gossip status of n3 since gossip status is published only by the node itself. After full cluster shutdown, gossip status of n3 will not be present until n3 is restarted again. So n4 will not think n3 is part of the ring. In this case, it is better to reject the bootstrap. With this patch, one would see the following when adding n4: ``` ERROR 2022-09-01 13:53:14,480 [shard 0] init - Startup failed: std::runtime_error (Node 127.0.0.3 has gossip status=UNKNOWN. Try fixing it before adding new node to the cluster.) ``` The user needs to perform either of the following before adding a new node: 1) Run nodetool removenode to remove n3 2) Restart n3 to get it back to the cluster Fixes #6088 Closes #11425	2022-10-11 15:47:34 +03:00
Botond Dénes	378c6aeebd	Merge 'More Raft upgrade tests' from Kamil Braun Refactor the existing upgrade tests, extracting some common functionality to helper functions. Add more tests. They are checking the upgrade procedure and recovery from failure in scenarios like when a node fails causing the procedure to get stuck or when we lose a majority in a fully upgraded cluster. Add some new functionalities to `ScyllaRESTAPIClient` like injecting errors and obtaining gossip generation numbers. Extend the removenode function to allow ignoring dead nodes. Improve checking for CQL availability when starting nodes to speed up testing. Closes #11725 * github.com:scylladb/scylladb: test/topology_raft_disabled: more Raft upgrade tests test/topology_raft_disabled: refactor `test_raft_upgrade` test/pylib: scylla_cluster: pass a list of ignored nodes to removenode test/pylib: rest_client: propagate errors from put_json test/pylib: fix some type hints test/pylib: scylla_cluster: don't create and drop keyspaces to check if cql is up	2022-10-11 15:30:00 +03:00
Kamil Braun	08e654abf5	Merge 'raft: (service) cleanups on the path for dynamic IP address support' from Konstantin Osipov In preparation for supporting IP address changes of Raft Group 0: 1) Always use start_server_for_group0() to start a server for group 0. This will provide a single extension point when it's necessary to prompt raft_address_map with gossip data. 2) Don't use raft::server_address in discovery, since going forward discovery won't store raft::server_address. On the same token stop using discovery::peer_set anywhere outside discovery (for persistence), use a peer_list instead, which is easier to marshal. Closes #11676 * github.com:scylladb/scylladb: raft: (discovery) do not use raft::server_address to carry IP data raft: (group0) API refactoring to avoid raft::server_address raft: rename group0_upgrade.hh to group0_fwd.hh raft: (group0) move the code around raft: (discovery) persist a list of discovered peers, not a set raft: (group0) always start group0 using start_server_for_group0()	2022-10-11 13:43:41 +02:00
Asias He	58c65954b8	storage_service: Reject decommission if nodes are down - Start n1, n2, n3 - Apply network nemesis as below: + Block gossip traffic going from nodes 1 and 2 to node 3. + All the other rpc traffic flows normally, including gossip traffic from node 3 to nodes 1 and 2 and responses to node_ops commands from nodes 1 and 2 to node 3. - Decommission n3 Currently, the decommission will be successful because all the network traffic is ok. But n3 could not advertise status STATUS_LEFT to the rest of the cluster due to the network nemesis applied. As a result, n1 and n3 could not move the n3 from STATUS_LEAVING to STATUS_LEFT, so n3 will stay in DL forever. I know why the node stays DL forever. The problem is that with node_ops_cmd based node operation, we still rely on the gossip status of STATUS_LEFT from the node being decommissioned to notify other nodes this node has finished decommission and can be moved from STATUS_LEAVING to STATUS_LEFT. This patch fixes by checking gossip liveness before running decommission. Reject if required peer nodes are down. With the fix, the decommission of n3 will fail like this: $ nodetool decommission -p 7300 nodetool: Scylla API server HTTP POST to URL '/storage_service/decommission' failed: std::runtime_error (decommission[adb3950e-a937-4424-9bc9-6a75d880f23d]: Rejected decommission operation, removing node=127.0.0.3, sync_nodes=[127.0.0.2, 127.0.0.3, 127.0.0.1], ignore_nodes=[], nodes_down={127.0.0.1}) Fixes #11302 Closes #11362	2022-10-11 14:09:28 +03:00
Botond Dénes	917fdb9e53	Merge "Cut database-system_keyspace circular dependency" from Pavel Emelyanov " There's one via the database's compaction manager and large data handler sub-services. Both need system keyspace to put their info into, but the latter needs database naturally via query_processor->storage_proxy link. The solution is to make c.m. \| l.d.h. -> sys.ks. dependency be weak with the help of shared_from_this(), described in details in patch #2 commit message. As a (not-that-)side effect this set removes a bunch of global qctx calls. refs: #11684 (this set seem to increase the chance of stepping on it) " * 'br-sysks-async-users' of https://github.com/xemul/scylla: large_data_handler: Use local system_keyspace to update entries system_keyspace: De-static compaction history update compaction_manager: Relax history paths database: Plug/unplug system_keyspace system_keyspace: Add .shutdown() method	2022-10-11 08:52:04 +03:00
Nadav Har'El	ef0da14d6f	test/cql-pytest: add simple tests for USE statement This patch adds a couple of simple tests for the USE statement: that without USE one cannot create a table without explicitly specifying a keyspace name, and with USE, it is possible. Beyond testing these specific feature, this patch also serves as an example of how to write more tests that need to control the effective USE setting. Specifically, it adds a "new_cql" function that can be used to create a new connection with a fresh USE setting. This is necessary in such tests, because if multiple tests use the same cql fixture and its single connection, they will share their USE setting and there is no way to undo or reset it after being set. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11741	2022-10-11 08:20:19 +03:00
Kamil Braun	df2fb21972	test/topology: reenable test_remove_node_add_column After #11691 was merged the test should no longer be flaky. Reenable it. Closes #11754	2022-10-11 08:18:20 +03:00
Pavel Emelyanov	8b8b37cdda	system_keyspace: Dont maintain dc/rack cache Some good news finally. The saved dc/rack info about the ring is now only loaded once on start. So the whole cache is not needed and the loading code in storage_service can be greatly simplified Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:18:31 +03:00
Pavel Emelyanov	775f42c8d1	system_keyspace: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:18:31 +03:00
Pavel Emelyanov	8f1df240c7	system_keyspace: Coroutinuze build_dc_rack_info() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:18:31 +03:00
Pavel Emelyanov	b6061bb97d	topology: Move all post-configuration to topology::config Because of snitch ex-dependencies some bits on topology were initialized with nasty post-start calls. Now it all can be removed and the initial topology information can be provided by topology::config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:18:31 +03:00
Pavel Emelyanov	56d4863eb6	snitch: Start early Snitch code doesn't need anything to start working, but it is needed by the low-level token-metadata, so move the snitch to start early (and to stop late) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:18:31 +03:00
Pavel Emelyanov	16188a261e	gossiper: Do not export system keyspace No users of it left. Despite the gossiper->system_keyspace dependency is not needed either, keep it alive because gossiper still updates system keyspace with feature masks, so chances are it will be reactivated some time later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	2bb354b2e7	snitch: Remove gossiper reference It doesn't need gossiper any longer. This change will allow starting snitch early by the next patch, and eventually improving the token-metadata start-up sequence Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	26f9472f21	snitch: Mark get_datacenter/_rack methods const They are in fact such, but wasn't marked as const before because they wanted to talk to non-const gossiper and system_keyspaces methods and updated snitch internal caches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	e9bd912f79	snitch: Drop some dead dependency knots After previous patches and merged branches snitch no longer needs its method that gets dc/rack for endpoints from gossiper, system keyspace and its internal caches. This cuts the last but the biggest snitch->gossiper dependency. Also this removes implicit snitch->system_keyspace dependency loop Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	4206b1f98f	snitch, code: Make get_datacenter() report local dc only The continuation of the previous patch -- all the code uses topology::get_datacenter(endpoint) to get peers' dc string. The topology still uses snitch for that, but it already contains the needed data. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	6c6711404f	snitch, code: Make get_rack() report local rack only All the code out there now calls snitch::get_rack() to get rack for the local node. For other nodes the topology::get_rack(endpoint) is used. Since now the topology is properly populated with endpoints, it can finally be patched to stop using snitch and get rack from its internal collections Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	bc813771e8	storage_service: Populate pending endpoint in on_alive() A special-purpose add-on to the previous patch. When messaging service accepts a new connection it sometimes may want to drop it early based on whether the client is from the same dc/rack or not. However, at this stage the information might have not yet had chances to be spread via storage service pending-tokens updating paths, so here's one more place -- the on_alive() callback Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	1be97a0a76	code: Populate pending locations Previous patches added the concept of pending endpoints in the topology, this patch populates endpoints in this state. Also, the set_pending_ranges() is patched to make sure that the tokens added for the enpoint(s) are added for something that's known by the topology. Same check exists in update_normal_tokens() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	b61bd6cf56	topology: Put local dc/rack on topology early Startup code needs to know the dc/rack of the local node early, way before nodes starts any communication with the ring. This information is available when snitch activates, but it starts _after_ token-metadata, so the only way to put local dc/rack in topology is via a startup-time special API call. This new init_local_endpoint() is temporary and will be removed later in this set Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	da75552e1f	topology: Add pending locations collection Nowadays the topology object only keeps info about nodes that are normal members of the ring. Nodes that are joining or bootstrapping or leaving are out of it. However, one of the goals of this patchset is to make topology object provide dc/rack info for _all_ nodes, even those in transitive state. The introduced _pending_locations is about to hold the dc/rack info for transitive endpoints. When a node becomes member of the ring it is moved from pending (if it's there) to current locations, when it leaves the ring it's moved back to pending. For now the new collection is just added and the add/remove/get API is extended to maintain it, but it's not really populated. It will come in the next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	fa613285e7	topology: Make get_location() errors more verbose Currently if topology.get_location() doesn't find an entry in its collection(s) it throws standard out-of-range exception which's very hard to debug. Also, next patches will extend this method, the introduced here if (_current_locations.contains()) makes this future change look nicer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	d60ebc5ace	token_metadata: Add config, spread everywhere Next patches will need to provide some early-start data for topology. The standard way of doing it is via service config, so this patch adds one. The new config is empty in this patch, to be filled later Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	7c211e8e50	token_metadata: Hide token_metadata_impl copy constructor Copying of token_metadata_impl is heavy operation and it's performed internally with the help of the dedicated clone_async() method. This method, in turn, doesn't copy the whole object in its copy-ctor, but rather default-initializes it and carries the remaining fields later. Having said that, the standart copy-ctor is better to be made private and, for the sake of being more explicit, marked as shallow-copy-ctor Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	072ef88ed1	gosspier: Remove messaging service getter No code needs to borrow messaging from gossiper, which is nice Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	66bc84d217	snitch: Get local address to gossip via config The property-file snitch gossips listen_address as internal-IP state. To get this value it gets it from snitch->gossiper->messaging_service chain. This change provides the needed value via config thus cutting yet another snitch->gossiper dependency and allowing gossiper not to export messaging service in the future Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	77bde21024	storage_service: Shuffle on_alive() callback No functional changes, just keep some conditions from if()s as local variables. This is the churn-reducing preparation for one of the the next patches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	583204972e	api: Don't report dc/rack for endpoints not in ring When an endpoint is not in ring the snitch/get_{rack\|datacenter} API still return back some value. The value is, in fact, the default one, because this is how snitch resolves it -- when it cannot find a node in gossiper and system keyspace it just returns defaults. When this happens the API should better return some error (bad param?) but there's a bug in nodetool -- when the 'status' command collects info about the ring it first collects the endpoints, then gets status for each. If between getting an endpoint and getting its status the endpoint disappears, the API would fail, but nodetool doesn't handle it. Next patches will make .get_rack/_dc calls use in-topology collections that don't fall-back to default values if the entry is not found in it, so prepare the API in advance to return back defaults. refs: #11706 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:12:47 +03:00
Konstantin Osipov	3e46c32d7b	raft: (discovery) do not use raft::server_address to carry IP data We plan to remove IP information from Raft addresses. raft::server_address is used in Raft configuration and also in discovery, which is a separate algorithm, as a handy data structure, to avoid having new entities in RPC. Since we plan to remove IP addresses from Raft configuration, using raft::server_address in discovery and still storing IPs in it would create ambiguity: in some uses raft::server_address would store an IP, and in others - would not. So switch to an own data structure for the purposes of discovery, discovery_peer, which contains a pair ip, raft server id. Note to reviewers: ideally we should switch to URIs in discovery_peer right away. Otherwise we may have to deal with incompatible changes in discovery when adding URI support to Scylla.	2022-10-10 16:24:33 +03:00
Pavel Emelyanov	b1f4273f0d	large_data_handler: Use local system_keyspace to update entries The l._d._h.'s way to update system keyspace is not like in other code. Instead of a dedicated helper on the system_keyspace's side it executes the insertion query directly with the help of qctx. Now when the l._d._h. has the weak system keyspace reference it can execute queries on _it_ rather than on the qctx. Just like in previous patch, it needs to keep the sys._k.s. weak reference alive until the query's future resolves. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-10 16:20:59 +03:00
Pavel Emelyanov	907fd2d355	system_keyspace: De-static compaction history update Compaction manager now has the weak reference on the system keyspace object and can use it to update its stats. It only needs to take care and keep the shared pointer until the respective future resolves. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-10 16:20:59 +03:00
Pavel Emelyanov	3e0b61d707	compaction_manager: Relax history paths There's a virtual method on table_state to update the entry in system keyspace. It's an overkill to facilitate tests that don't want this. With new system_keyspace weak referencing it can be made simpled by moving the updating call to the compaction_manager itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-10 16:20:59 +03:00
Pavel Emelyanov	f9b57df471	database: Plug/unplug system_keyspace There's a circular dependency between system_keyspace and database. The former needs the latter because it needs to execula local requests via query_processor. The latter needs the former via compaction manager and large data handler, database depends on both and these too need to insert their entries into system keyspace. To cut this loop the compaction manager and large data handler both get a weak reference on the system keysace. Once system keyspace starts is activcates this reference via the database call. When system keyspace is shutdown-ed on stop, it deactivates the reference. Technically the weak reference is implemented by marking the system_k.s. object as async_sharded_service, and the "reference" in question is the shared_from_this() pointer. When compaction manager or large data handler need to update a system keyspace's table, they both hold an extra reference on the system keyspace until the entry is committed, thus making sure that sys._k.s. doesn't stop from under their feet. At the same time, unplugging the reference on shutdown makes sure that no new entries update will appear and the system_k.s. will eventually be released. It's not a C++ classical reference, because system_keyspace starts after and stops before database. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-10 16:20:59 +03:00
Konstantin Osipov	8857e017c7	raft: (group0) API refactoring to avoid raft::server_address Replace raft::server_address in a few raft_group0 API calls with raft::server_id. These API calls do not need raft::server_address, i.e. the address part, anyway, and since going forward raft::server_address will not contain the IP address, stop using it in these calls. This is a beginning of a multi-patch series to reduce raft::server_address usage to core raft only.	2022-10-10 15:58:48 +03:00
Konstantin Osipov	224dd9ce1e	raft: rename group0_upgrade.hh to group0_fwd.hh The plan is to add other group-0-related forward declarations to this file, not just the ones for upgrade.	2022-10-10 15:58:48 +03:00
Konstantin Osipov	e226624daf	raft: (group0) move the code around Move load/store functions for discovered peers up, since going forward they'll be used to in start_server_for_group0(), to extend the address map prior to start (and thus speed up bootstrap).	2022-10-10 15:58:48 +03:00
Konstantin Osipov	199b6d6705	raft: (discovery) persist a list of discovered peers, not a set We plan to reuse the discovery table to store the peers after discovery is over, so load/store API must be generalized to use outside discovery. This includes sending the list of persisted peers over to a new member of the cluster.	2022-10-10 15:58:48 +03:00
Konstantin Osipov	746322b740	raft: (group0) always start group0 using start_server_for_group0() When IP addresses are removed from raft::configuration, it's key to initialize raft_address_map with IP addresses before we start group 0. Best place to put this initialization is start_server_for_group0(), so make sure all paths which create group 0 use start_server_for_group0().	2022-10-10 15:58:48 +03:00
Kamil Braun	4974a31510	test/topology_raft_disabled: more Raft upgrade tests The tests are checking the upgrade procedure and recovery from failure in scenarios like when a node fails causing the procedure to get stuck or when we lose a majority in a fully upgraded cluster. Added some new functionalities to `ScyllaRESTAPIClient` like injecting errors and obtaining gossip generation numbers.	2022-10-10 14:32:10 +02:00
Pavel Emelyanov	caed12c8f2	system_keyspace: Add .shutdown() method Many services out there have one (sometimes called .drain()) that's called early on stop and that's responsible for prearing the service for stop -- aborting pending/in-flight fibers and alike. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-10 15:29:33 +03:00
Kamil Braun	4460b4e63c	test/topology_raft_disabled: refactor `test_raft_upgrade` Take reusable parts out of the test to helper functions.	2022-10-10 12:59:12 +02:00
Kamil Braun	fa8dcb0d54	test/pylib: scylla_cluster: pass a list of ignored nodes to removenode The `removenode` operation normally requires the removing node to contact every node in the cluster except the one that is being removed. But if more than 1 node is down it's possible to specify a list of nodes to ignore for the operation; the `/storage_service/remove_node` endpoint accepts an `ignore_nodes` param which is a comma-separated list of IPs. Extend `ScyllaRESTAPIClient`, `ScyllaClusterManager` and `ManagerClient` so it's possible to pass the list of ignored nodes. We also modify the `/cluster/remove-node` Manager endpoint to use `put_json` instead of `get_text` and pass all parameters except the initiator IP (the IP of the node who coordinates the `removenode` operation) through JSON. This simplifies the URL greatly (it was already messy with 3 parameters) and more closely resembles Scylla's endpoint.	2022-10-10 12:59:12 +02:00
Kamil Braun	130ab1d312	test/pylib: rest_client: propagate errors from put_json	2022-10-10 12:59:12 +02:00
Kamil Braun	63892326d5	test/pylib: fix some type hints	2022-10-10 12:59:12 +02:00
Kamil Braun	6e3fe13fcf	test/pylib: scylla_cluster: don't create and drop keyspaces to check if cql is up Do a simple `SELECT` instead. This speeds up tests - creating and dropping keyspaces is relatively expensive, and we did this on every server restart.	2022-10-10 12:59:12 +02:00
Alejo Sanchez	7e2a3f2040	test.py: improve pylint score for conftest Remove unused imports, fix long lines, add ignore flags. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-10-10 12:07:41 +02:00
Alejo Sanchez	aa1f4a321c	test.py: fix variable name collision with ssl Change variable name to avoid collision with module ssl. This bug was reintroduced when moving code. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-10-10 11:59:13 +02:00
Pavel Emelyanov	53bad617c0	virtual_tables: Use token_metadata.is_member() This method just jumps into topology.has_endpoint(). The change is for consistency with other users of it and as a preparation for topology.has_endpoint() future enhancements Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-10 12:16:19 +03:00
Tomasz Grabiec	fcf0628bc5	dbuild: Use .gdbinit from the host Useful when starting gdb inside the dbuild container. Message-Id: <20221007154230.1936584-1-tgrabiec@scylladb.com>	2022-10-09 11:14:33 +03:00
Petr Gusev	0923cb435f	raft: mark removed servers as expiring instead of dropping them There is a flaw in how the raft rpc endpoints are currently managed. The io_fiber in raft::server is supposed to first add new servers to rpc, then send all the messages and then remove the servers which have been excluded from the configuration. The problem is that the send_messages function isn't synchronous, it schedules send_append_entries to run after all the current requests to the target server, which can happen after we have already removed the server from address_map. In this patch the remove_server function is changed to mark the server_id as expiring rather than synchronously dropping it. This means all currently scheduled requests to that server will still be able to resolve the ip address for that server_id. Fixes: #11228 Closes #11748	2022-10-07 19:08:34 +02:00
Avi Kivity	55606a51cb	dirty_memory_manager: move region_group queued allocation releasing into a function It's nicer to see a function release_queued_allocations() in a stack trace rather than start_releaser(), which has done its work during initialization.	2022-10-07 17:27:43 +03:00
Avi Kivity	3e60d6c243	dirty_memory_manager: fold allocation_queue into region_group allocation_queue was extracted out of region_group in `71493c253` and `34d532236`. But now that region_group refactoring is mostly done, we can move them back in. allocation_queue has just one user and is not useful standalone.	2022-10-07 17:27:40 +03:00
Avi Kivity	01368830b5	dirty_memory_manager: don't ignore timeout in allocation_queue::push_back() In `34d5322368` ("dirty_memory_manager: move more allocation_queue functions out of region_group") we accidentally started ignoring the timeout parameter. Fix that. No release branch has the breakage.	2022-10-07 17:19:56 +03:00
Kamil Braun	06b87869ba	Merge 'Raft transport error' from Gusev Petr The `add_entry` and `modify_config` methods sometimes do an rpc to execute the request on the current leader. If the tcp connection was broken, a `seastar::rpc::closed_error` would be thrown to the client. This exception was not documented in the method comments and the client could have missed handling it. For example, this exception was not handled when calling `modify_config` in `raft_group0`, which sometimes broke the `removenode` command. An `intermittent_connection_error` exception was added earlier to solve a similar problem with the `read_barrier` method. In this patch it is renamed to `transport_error`, as it seems to better describe the situation, and an explicit specification for this exception was added - the rpc implementation can throw it if it is not known whether the call reached the destination and whether any mutations were made. In case of `read_barrier` it does not matter and we just retry, in case of `add_entry` and `modify_config` we cannot retry because of possible mutations, so we convert this exception to `commit_status_unknown`, which the client has to handle. Explicit comments have also been added to `raft::server` methods describing all possible exceptions. Closes #11691 * github.com:scylladb/scylladb: raft_group0: retry modify_config on commit_status_unknown raft: convert raft::transport_error to raft::commit_status_unknown	2022-10-07 15:53:22 +02:00
Petr Gusev	12bb8b7c8d	raft_group0: retry modify_config on commit_status_unknown modify_config can throw commit_status_unknown in case of a leader change or when the leader is unavailable, but the information about it has not yet reached the current node. In this patch modify_config is run again after some time in this case.	2022-10-07 13:34:23 +04:00
Petr Gusev	d79fbab682	raft: convert raft::transport_error to raft::commit_status_unknown The add_entry and modify_config methods sometimes do an rpc to execute the request on the current leader. If the tcp connection was broken, a seastar::rpc::closed_error would be thrown to the client. This exception was not documented in the method comments and the client could have missed handling it. For example, this exception was not handled when calling modify_config in raft_group0, which sometimes broke the removenode command. An intermittent_connection_error exception was added earlier to solve a similar problem with the read_barrier method. In this patch it is renamed to transport_error, as it seems to better describe the situation, and an explicit specification for this exception was added - the rpc implementation can throw it if it is not known whether the call reached the target node and whether any actions were performed on it. In case of read_barrier it does not matter and we just retry. In case of add_entry and modify_config we cannot retry because the rpc calls are not idempotent, so we convert this exception to commit_status_unknown, which the client has to handle. Explicit comments have also been added to raft::server methods describing all possible exceptions.	2022-10-07 13:34:16 +04:00
Botond Dénes	b247f29881	Merge 'De-static system_keyspace::get_{saved\|local}_tokens()' from Pavel Emelyanov Yet another user of global qctx object. Making the method(s) non-static requires pushing the system_keyspace all the way down to size_estimate_virtual_reader and a small update of the cql_test_env Closes #11738 * github.com:scylladb/scylladb: system_keyspace: Make get_{local\|saved}_tokens non static size_estimates_virtual_reader: Pass sys_ks argument to get_local_ranges() cql_test_env: Keep sharded<system_keyspace> reference size_estimate_virtual_reader: Keep system_keyspace reference system_keyspace: Pass sys_ks argument to install_virtual_readers() system_keyspace: Make make() non-static distributed_loader: Pass sys_ks argument to init_system_keyspace() system_keyspace: Remove dangling forward declaration	2022-10-07 11:28:32 +03:00
Botond Dénes	992afc5b8c	Merge 'storage_proxy: coroutinize some functions with do_with' from Avi Kivity do_with() is a sure indicator for coroutinization, since it adds an allocation (like the coroutine does with its frame). Therefore translating a function with do_with is at least a break-even, and usually a win since other continuations no longer allocate. This series converts most of storage_proxy's function that have do_with to coroutines. Two remain, since they are not simple to convert (the do_with() is kept running in the background and its future is discarded). Individual patches favor minimal changes over final readability, and there is a final patch that restores indentation. The patches leave some moves from coroutine reference parameters to the coroutine frame, this will be cleaned up in a follow-up. I wanted this series not to touch headers to reduce rebuild times. Closes #11683 * github.com:scylladb/scylladb: storage_proxy: reindent after coroutinization storage_proxy: convert handle_read_digest() to a coroutine storage_proxy: convert handle_read_mutation_data() to a coroutine storage_proxy: convert handle_read_data() to a coroutine storage_proxy: convert handle_write() to a coroutine storage_proxy: convert handle_counter_mutation() to a coroutine storage_proxy: convert query_nonsingular_mutations_locally() to a coroutine	2022-10-07 07:37:37 +03:00
Nadav Har'El	72dbce8d46	docs, alternator: mention S3 Import feature in compatibility.md In August 2022, DynamoDB added a "S3 Import" feature, which we don't yet support - so let's document this missing feature in the compatibility document. Refs #11739. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11740	2022-10-06 19:50:16 +03:00
Avi Kivity	20bad62562	Merge 'Detect and record large collections' from Benny Halevy This series adds support for detecting collections that have too many items and recording them in `system.large_cells`. A configuration variable was added to db/config: `compaction_collection_items_count_warning_threshold` set by default to 10000. Collections that have more items than this threshold will be warned about and will be recorded as a large cell in the `system.large_cells` table. Documentation has been updated respectively. A new column was added to system.large_cells: `collection_items`. Similar to the `rows` column in system.large_partition, `collection_items` holds the number of items in a collection when the large cell is a collection, or 0 if it isn't. Note that the collection may be recorded in system.large_cells either due to its size, like any other cell, and/or due to the number of items in it, if it cross the said threshold. Note that #11449 called for a new system.large_collections table, but extending system.large_cells follows the logic of system.large_partitions is a smaller change overall, hence it was preferred. Since the system keyspace schema is hard coded, the schema version of system.large_cells was bumped, and since the change is not backward compatible, we added a cluster feature - `LARGE_COLLECTION_DETECTION` - to enable using it. The large_data_handler large cell detection record function will populate the new column only when the new cluster feature is enabled. In addition, unit tests were added in sstable_3_x_test for testing large cells detection by cell size, and large_collection detection by the number of items. Closes #11449 Closes #11674 * github.com:scylladb/scylladb: sstables: mx/writer: optimize large data stats members order sstables: mx/writer: keep large data stats entry as members db: large_data_handler: dynamically update config thresholds utils/updateable_value: add transforming_value_updater db/large_data_handler: cql_table_large_data_handler: record large_collections db/large_data_handler: pass ref to feature_service to cql_table_large_data_handler db/large_data_handler: cql_table_large_data_handler: move ctor out of line docs: large-rows-large-cells-tables: fix typos db/system_keyspace: add collection_elements column to system.large_cells gms/feature_service: add large_collection_detection cluster feature test: sstable_3_x_test: add test_sstable_too_many_collection_elements test: lib: simple_schema: add support for optional collection column test: lib: simple_schema: build schema in ctor body test: lib: simple_schema: cql: define s1 as static only if built this way db/large_data_handler: maybe_record_large_cells: consider collection_elements db/large_data_handler: debug cql_table_large_data_handler::delete_large_data_entries sstables: mx/writer: pass collection_elements to writer::maybe_record_large_cells sstables: mx/writer: add large_data_type::elements_in_collection db/large_data_handler: get the collection_elements_count_threshold db/config: add compaction_collection_elements_count_warning_threshold test: sstable_3_x_test: add test_sstable_write_large_cell test: sstable_3_x_test: pass cell_threshold_bytes to large_data_handler test: sstable_3_x_test: large_data_handler: prepare callback for testing large_cells test: sstable_3_x_test: large_data tests: use BOOST_REQUIRE_[GL]T test: sstable_3_x_test: test_sstable_log_too_many_rows: use tests::random	2022-10-06 18:28:21 +03:00
Avi Kivity	62a4d2d92b	Merge 'Preliminary changes for multiple Compaction Groups' from Raphael "Raph" Carvalho What's contained in this series: - Refactored compaction tests (and utilities) for integration with multiple groups - The idea is to write a new class of tests that will stress multiple groups, whereas the existing ones will still stress a single group. - Fixed a problem when cloning compound sstable set (cannot be triggered today so I didn't open a GH issue) - Many changes in replica::table for allowing integration with multiple groups Next: - Introduce for_each_compaction_group() for iterating over groups wherever needed. - Use for_each_compaction_group() in replica::table operations spanning all groups (API, readers, etc). - Decouple backlog tracker from compaction strategy, to allow for backlog isolation across groups - Introduce static option for defining number of compaction groups and implement function to map a token to its respective group. - Testing infrastructure for multiple compaction groups (helpful when testing the dynamic behavior: i.e. merging / splitting). Closes #11592 * github.com:scylladb/scylladb: sstable_resharding_test: Switch to table_for_tests replica: Move compacted_undeleted_sstables into compaction group replica: Use correct compaction_group in try_flush_memtable_to_sstable() replica: Make move_sstables_from_staging() robust and compaction group friendly test: Rename column_family_for_tests to table_for_tests sstable_compaction_test: Use column_family_for_tests::as_table_state() instead test: Don't expose compound set in column_family_for_tests test: Implement column_family_for_tests::table_state::is_auto_compaction_disabled_by_user() sstable_compaction_test: Merge table_state_for_test into column_family_for_tests sstable_compaction_test: use table_state_for_test itself in fully_expired_sstables() sstable_compaction_test: Switch to table_state in compact_sstables() sstable_compaction_test: Reduce boilerplate by switching to column_family_for_tests	2022-10-06 18:23:47 +03:00
Kamil Braun	f94d547719	test.py: include modes in log file name Instead of `test.py.log`, use: `test.py.dev.log` when running with `--mode dev`, `test.py.dev-release.log` when running with `--mode dev --mode release`, and so on. This is useful in Jenkins which is running test.py multiple times in different modes; a later run would overwrite a previous run's test.py file. With this change we can preserve the test.py files of all of these runs. Closes #11678	2022-10-06 18:20:39 +03:00
Kamil Braun	3af68052c4	test/topology: disable flaky `test_remove_node_add_column` test The test was added recently and since then causes CI failures. We suspect that it happens if the node being removed was the Raft group 0 leader. The removenode coordinator tries to send to it the `remove_from_group0` request and fails. A potential fix is in review: #11691.	2022-10-06 17:04:42 +02:00
Pavel Emelyanov	59da903054	system_keyspace: Make get_{local\|saved}_tokens non static Now all callers have system_keyspace reference at hand. This removes one more user of the global qctx object Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-06 18:02:09 +03:00
Pavel Emelyanov	b03f1e7b17	size_estimates_virtual_reader: Pass sys_ks argument to get_local_ranges() This method static calls system_keyspace::get_local_tokens(). Having the system_keyspace reference will make this method non-static Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-06 18:00:09 +03:00
Pavel Emelyanov	4c099bb3ed	cql_test_env: Keep sharded<system_keyspace> reference There's a test_get_local_ranges() call in size-estimate reader which will need system keyspace reference. There's no other place for tests to get it from but the cql_test_env thing Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-06 17:59:21 +03:00
Pavel Emelyanov	34e8e5959f	size_estimate_virtual_reader: Keep system_keyspace reference The s._e._v._reader::fill_buffer() method needs system keyspace to get node's local tokens. Now it's a static method, having system_keyspace reference will make it non-static Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-06 17:58:07 +03:00
Pavel Emelyanov	04552f2d58	system_keyspace: Pass sys_ks argument to install_virtual_readers() The size-estimate-virtual-reader will need it, now it's available as "this" from system_keyspace::make() method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-06 17:57:13 +03:00
Pavel Emelyanov	1938412d7a	system_keyspace: Make make() non-static This helper needs system_keyspace reference and using "this" as this looks natural. Also this de-static-ification makes it possible to put some sense into the invoke_on_all() call from init_system_keyspace() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-06 17:56:11 +03:00
Pavel Emelyanov	9f79525f8e	distributed_loader: Pass sys_ks argument to init_system_keyspace() It's final destination is virtual tabls registration code called from init_system_keyspace() eventually Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-06 17:55:03 +03:00
Pavel Emelyanov	e996503f0d	system_keyspace: Remove dangling forward declaration It doesn't match the real system_keyspace_make() definition and is in fact not needed, as there's another "real" one in database.hh Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-06 17:54:22 +03:00
Vlad Zolotarov	8195dab92a	scylla_prepare: correctly handle a former 'MQ' mode Fixes a regression introduced in `80917a1054`: "scylla_prepare: stop generating 'mode' value in perftune.yaml" When cpuset.conf contains a "full" CPU set the negation of it from the "full" CPU set is going to generate a zero mask as a irq_cpu_mask. This is an illegal value that will eventually end up in the generated perftune.yaml, which in line will make the scylla service fail to start until the issue is resolved. In such a case a irq_cpu_mask must represent a "full" CPU set mimicking a former 'MQ' mode. Fixes #11701 Tested: - Manually on a 2 vCPU VM in an 'auto-selection' mode. - Manually on a large VM (48 vCPUs) with an 'MQ' manually enforced. Message-Id: <20221004004237.2961246-1-vladz@scylladb.com>	2022-10-06 17:43:37 +03:00
Avi Kivity	9932c4bd62	Merge 'cql3: Make CONTAINS NULL and CONTAINS KEY NULL return false' from Jan Ciołek Currently doing `CONTAINS NULL` or `CONTAINS KEY NULL` on a collection evaluates to `true`. This is a really weird behaviour. Collections can't contain `NULL`, even if they wanted to. Any operation that has a NULL on either side should evaluate to `NULL`, which is interpreted as `false`. In Cassandra trying to do `CONTAINS NULL` causes an error. Fixes: #10359 The only problem is that this change is not backwards compatible. Some existing code might break. Closes #11730 * github.com:scylladb/scylladb: cql3: Make CONTAINS KEY NULL return false cql3: Make CONTAINS NULL return false	2022-10-06 17:08:56 +03:00
Petr Gusev	40bd9137f8	removenode: add warning in case of exception The removenode_abort logic that follows the warning may throw, in which case information about the original exception was lost. Fixes: #11722 Closes #11735	2022-10-06 13:49:26 +02:00
Benny Halevy	480b4759a9	idl: streaming: include stream_fwd.hh To keep the idl definition of plan_id from getting out of sync with the one in stream_fwd.hh. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11720	2022-10-06 13:49:26 +02:00
Kamil Braun	962ee9ba7b	Merge 'Make raft_group0 -> system_keyspace dependency explicit' from Pavel Emelyanov The raft_group0 code needs system_keyspace and now it gets one from gossiper. This gossiper->system_keyspace dependency is in fact artificial, gossiper doesn't need system ks, it's there only to let raft and snitch call gossiper.get_system_keyspace(). This makes raft use system ks directly, snitch is patched by another branch Closes #11729 * github.com:scylladb/scylladb: raft_group0: Use local reference raft_group0: Add system keyspace reference	2022-10-06 13:49:26 +02:00
Tomasz Grabiec	023f78d6ae	test: lib: random_mutation_generator: Introduce a switch for generating simpler mutations for easier debugging Closes #11731	2022-10-06 13:49:26 +02:00
Raphael S. Carvalho	14d6459efc	compaction: Make compaction_manager stop more robust Commit `aba475fe1d` accidentally fixed a race, which happens in the following sequence of events: 1) storage service starts drain() via API for example 2) main's abort source is triggered, calling compaction_manager's do_stop() via subscription. 2.1) do_stop() initiates the stop but doesn't wait for it. 2.2) compaction_manager's state is set to stopped, such that compaction_manager::stop() called in defer_verbose_shutdown() will wait for the stop and not start a new one. 3) drain() calls compaction_manager::drain() changing the state from stopped to disabled. 4) main calls compaction_manager::stop() (as described in 2.2) and incorrectly tries to stop the manager again, because the state was changed in step 3. `aba475fe1d` accidentally fixed this problem because drain() will no longer take place if it detects the shutdown process was initiated (it does so by ignoring drain request if abort source's subscription was unlinked). This shows us that looking at the state to determine if stop should be performed is fragile, because once the state changes from A to B, manager doesn't know the state was A. To make it robust, we can instead check if the future that stores stop's promise is engaged, meaning that the stop was already initiated and we don't have to start a new one. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #11711	2022-10-06 13:49:26 +02:00
Botond Dénes	753f671eaa	Merge 'dirty_memory_manager: simplify, clarify, and document' from Avi Kivity This series undoes some recent damage to clarity, then goes further by renaming terms around dirty_memory_manager to be clearer. Documentation is added. Closes #11705 * github.com:scylladb/scylladb: dirty_memory_manager: re-term "virtual dirty" to "unspooled dirty" dirty_memory_manager: rename _virtual_region_group api: column_family: fix memtable off-heap memory reporting dirty_memory_manager: unscramble terminology	2022-10-06 13:49:26 +02:00
Tomasz Grabiec	4c8dc41f75	Merge 'Handle storage_io_error's ENOSPC when flushing' from Pavel Emelyanov This is the continuation of the `a980510654` that tries to catch ENOSPCs reported via storage_io_error similarly to how defer_verbose_shutdown() does on stop Closes #11664 * github.com:scylladb/scylladb: table: Handle storage_io_error's ENOSPC when flushing table: Rewrap retry loop	2022-10-06 13:49:26 +02:00
Raphael S. Carvalho	fcdff50a35	sstable_resharding_test: Switch to table_for_tests Important step for multiple compaction groups. As a bonus, lots of boilerplate is removed. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-10-05 21:37:19 -03:00
Raphael S. Carvalho	cf3f93304e	replica: Move compacted_undeleted_sstables into compaction group Compacted undeleted sstables are relevant for avoiding data resurrection in the purge path. As token ranges of groups won't overlap, it's better to isolate this data, so to prevent one group from interfering with another. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-10-05 21:37:19 -03:00
Raphael S. Carvalho	56ac62bbd6	replica: Use correct compaction_group in try_flush_memtable_to_sstable() We need to pass the compaction_group received as a param, not the one retrieved via as_table_state(). Needed for supporting multiple groups. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-10-05 21:37:19 -03:00
Raphael S. Carvalho	707ebf9cf7	replica: Make move_sstables_from_staging() robust and compaction group friendly Off-strategy can happen in parallel to view building. A semaphore is used to ensure they don't step on each other's toe. If off-strategy completes first, then move_sstables_from_staging() won't find the SSTable alive and won't reach code to add the file to the backlog tracker. If view building completes first, the SSTable exists, but it's not reshaped yet (has repair origin) and shouldn't be added to the backlog tracker. Off-strategy completion code will make sure new sstables added to main set are accounted by the backlog tracker, so move_sstables_from_staging() only need to add to tracker files which are certainly not going through a reshape compaction. So let's take these facts into account to make the procedure more robust and compaction group friendly. Very welcome change for when multiple groups are supported. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-10-05 21:37:19 -03:00
Raphael S. Carvalho	7d82373e3a	test: Rename column_family_for_tests to table_for_tests To avoid confusion, as replica::column_family was already renamed to replica::table. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-10-05 21:37:19 -03:00
Raphael S. Carvalho	e56bfecd8d	sstable_compaction_test: Use column_family_for_tests::as_table_state() instead That's important for multiple compaction groups. Once replica::table supports multiple groups, there will be no table::as_table_state(), so for testing table with a single group, we'll be relying on column_family_for_tests::as_table_state(). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-10-05 21:37:19 -03:00
Raphael S. Carvalho	5a028ca4dc	test: Don't expose compound set in column_family_for_tests The compound set shouldn't be exposed in main_sstables() because once we complete the switch to column_family_for_tests::table_state, can happen compaction will try to remove or add elements to its set snapshot, and compound set isn't allowed to either ops. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-10-05 21:37:19 -03:00
Raphael S. Carvalho	b16d6c55b1	test: Implement column_family_for_tests::table_state::is_auto_compaction_disabled_by_user() Needed once we switch to column_family_for_tests::table_state, so unit tests relying on correct value will still work Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-10-05 21:37:19 -03:00
Raphael S. Carvalho	a6d24a763a	sstable_compaction_test: Merge table_state_for_test into column_family_for_tests This change will make table_state_for_test the table_state of column_family_for_tests. Today, an unit test has to keep a reference to them both and logically couple them, but that's error prone. This change is also important when replica::table supports multiple compaction groups, so unit tests won't have to directly reference the table_state of table, but rather use the one managed by column_family_for_tests. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-10-05 21:37:19 -03:00
Raphael S. Carvalho	6a0eabd17a	sstable_compaction_test: use table_state_for_test itself in fully_expired_sstables() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-10-05 21:37:19 -03:00
Raphael S. Carvalho	a6affea008	sstable_compaction_test: Switch to table_state in compact_sstables() The switch is important once we have multiple compaction groups, as a single table may own several groups. There will no longer be a replica::table::as_table_state(). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-10-05 21:37:19 -03:00
Raphael S. Carvalho	2aa6518486	sstable_compaction_test: Reduce boilerplate by switching to column_family_for_tests Lots of boilerplate is reduced, and will also help to complete the switch from replica::table to compaction::table_state in the unit tests. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-10-05 21:37:18 -03:00
Jan Ciolek	a2c359a741	cql3: Make CONTAINS KEY NULL return false A binary operator like this: {1: 2, 3: 4} CONTAINS KEY NULL used to evaluate to `true`. This is wrong, any operation involving null on either side of the operator should evaluate to NULL, which is interpreted as false. This change is not backwards compatible. Some existing code might break. partially fixes: #10359 Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-10-05 18:15:44 +02:00
Jan Ciolek	bbfef4b510	cql3: Make CONTAINS NULL return false A binary operator like this: [1, 2, 3] CONTAINS NULL used to evaluate to `true`. This is wrong, any operation involving null on either side of the operator should evaluate to NULL, which is interpreted as false. This change is not backwards compatible. Some existing code might break. partially fixes: #10359 Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-10-05 18:15:15 +02:00
Pavel Emelyanov	fb8ed684fa	raft_group0: Use local reference It now grabs one from gossiper which is weird. A bit later it will be possible to remove gossiper->system_keyspace dependency Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-05 17:35:58 +03:00
Pavel Emelyanov	8570fe3c30	raft_group0: Add system keyspace reference The sharded<system_keyspace> is already started by the time raft_group0 is created Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-05 17:35:13 +03:00
Michał Chojnowski	a0204c17c5	treewide: remove mentions of seastar::thread::should_yield() thread_scheduling_group has been retired many years ago. Remove the leftovers, they are confusing. Closes #11714	2022-10-05 12:26:37 +03:00
Michał Chojnowski	8aa24194b7	row_cache: remove a dead try...catch block in eviction All calls in the try block have been noexcept for some time. Remove the try...catch and the associated misleading comment to avoid confusing source code readers. Closes #11715	2022-10-05 12:23:47 +03:00
Benny Halevy	7286f5d314	sstables: mx/writer: optimize large data stats members order Since `_partition_size_entry` and `_rows_in_partition_entry` are accessed at the same time when updated, and similarly `_cell_size_entry` and `_elements_in_collection_entry`, place the member pairs closely together to improve data cache locality. Follow the same order when preparing the `scylla_metadata::large_data_stats` map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-05 10:54:04 +03:00
Benny Halevy	8c8a0adb40	sstables: mx/writer: keep large data stats entry as members To save the map lookup on the hot write path, keep each large data stats entry as a member in the writer object and build a map for storing the disk_hash in the scylla metadata only when finalizing it in consume_end_of_stream. Fixes #11686 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-05 10:54:04 +03:00
Benny Halevy	2c4ff71d2b	db: large_data_handler: dynamically update config thresholds make the various large data thresholds live-updateable and construct the observers and updaters in cql_table_large_data_handler to dynamically update the base large_data_handler class threshold members. Fixes #11685 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-05 10:53:40 +03:00
Benny Halevy	6d582054c0	utils/updateable_value: add transforming_value_updater Automatically updates a value from a utils::updateable_value Where they can be of different types. An optional transfom function can provide an additional transformation when updating the value, like multiplying it by a factor for unit conversion, for example. To be used for auto-updating the large data thresholds from the db::config. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-05 10:52:49 +03:00
Botond Dénes	4c13328788	Merge 'Return all sstables in table::get_sstable_set()' from Raphael "Raph" Carvalho This fixes a regression introduced by `1e7a444`, where table::get_sstable_set() isn't exposing all sstables, but rather only the ones in the main set. That causes user of the interface, such as get_sstables_by_partition_key() (used by API to return sstable name list which contains a particular key), to miss files in the maintenance set. Fixes https://github.com/scylladb/scylladb/issues/11681. Closes #11682 * github.com:scylladb/scylladb: replica: Return all sstables in table::get_sstable_set() sstables: Fix cloning of compound_sstable_set	2022-10-05 06:55:50 +03:00
Pavel Emelyanov	2c1ef0d2b7	sstables.hh: Remove unused headers Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11709	2022-10-04 23:37:07 +02:00
Raphael S. Carvalho	827750c142	replica: Return all sstables in table::get_sstable_set() get_sstable_set() as its name implies is not confined to the main or maintenance set, nor to a specific compaction group, so let's make it return the compound set which spans all groups, meaning all sstables tracked by a table will be returned. This is a regression introduced in `1e7a444`. It affects the API to return sstable list containing a partition key, as sstables in maintenance would be missed, fooling users of the API like tools that could trust the output. Each compaction group is returning the main and maintenance set in table_state's main_sstable_set() and maintenance_sstable_set(), respectively. Fixes #11681. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-10-04 10:43:27 -03:00
Raphael S. Carvalho	eddf32b94c	sstables: Fix cloning of compound_sstable_set The intention was that its clone() would actually clone the content of an existing set into a new one, but the current impl is actually moving the sets instead of copying them. So the original set becomes invalid. Luckily, this problem isn't triggered as we're not exposing the compound set in the table's interface, so the compound_sstable_set::clone() method isn't being called. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-10-04 10:43:25 -03:00
Felipe Mendes	f67bb43a7a	locator: ec2_snitch: IMDSv2 support Access to AWS Metadata may be configured in three distinct ways: 1 - Optional HTTP tokens and HTTP endpoint enabled: The default as it works today 2 - Required HTTP tokens and HTTP endpoint enabled: Which support is entirely missing today 3 - HTTP endpoint disabled: Which effectively forbids one to use Ec2Snitch or Ec2MultiRegionSnitch This commit makes the 2nd option the default which is not only AWS recommended option, but is also entirely compatible with the 1st option. In addition, we now validate the HTTP response when querying the IMDS server. Therefore - should a HTTP 403 be received - Scylla will properly notify users on what they are trying to do incorrectly in their setup. The commit was tested under the following circumstances (covering all 3 variants): - Ec2Snitch: IMDSv2 optional & required, and HTTP server disabled. - Ec2MultiRegionSnitch: IMDSv2 optional & required, and HTTP server disabled. Refs: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-options.html https://github.com/scylladb/scylladb/issues/9987 Fixes: https://github.com/scylladb/scylladb/issues/10490 Closes: https://github.com/scylladb/scylladb/issues/10490 Closes #11636	2022-10-04 15:48:42 +03:00
Avi Kivity	37c6b46d26	dirty_memory_manager: re-term "virtual dirty" to "unspooled dirty" The "virtual dirty" term is not very informative. "Virtual" means "not real", but it doesn't say in which way it isn't real. In this case, virtual dirty refers to real dirty memory, minus the portion of memtables that has been written to disk (but not yet sealed - in that case it would not be dirty in the first place). I chose to call "the portion of memtables that has been written to disk" as "spooled memory". At least the unique term will cause people to look it up and may be easier to remember. From that we have "unspooled memory". I plan to further change the accounting to account for spooled memory rather than unspooled, as that is a more natural term, but that is left for later. The documentation, config item, and metrics are adjusted. The config item is practically unused so it isn't worth keeping compatibility here.	2022-10-04 14:03:59 +03:00
Avi Kivity	d02c407769	dirty_memory_manager: rename _virtual_region_group Since we folded _real_region_group into _virtual_region_group, the "virtual" tag makes no sense any more, so remove it.	2022-10-04 14:01:45 +03:00
Avi Kivity	b0814bdd42	api: column_family: fix memtable off-heap memory reporting We report virtual memory used, but that's not a real accounting of the actual memory used. Use the correct real_memory_used() instead. Note that this isn't a recent regression and was probably broken forever. However nobody looks at this measure (and it's usually close to the correct value) so nobody noticed. Since it's so minor, I didn't bother filing an issue.	2022-10-04 13:56:29 +03:00
Avi Kivity	bc2fcf5187	dirty_memory_manager: unscramble terminology Before `95f31f37c1` ("Merge 'dirty_memory_manager: simplify region_group' from Avi Kivity"), we had two region_group objects, one _real_region_group and another _virtual_region_group, each with a set of "soft" and "hard" limits and related functions and members. In `95f31f37c1`, we merged _real_region_group into _virtual_region_group, but unfortunately the _real_region_group members received the "hard" prefix when they got merged. This overloads the meaning of "hard" - is it related to soft/hard limit or is it related to the real/virtual distinction? This patch applied some renaming to restore consistency. Anything that came from _virtual_region_group now has "virtual" in its name. Anything that came from _real_region_group now has "real" in its name. The terms are still pretty bad but at least they are consistent.	2022-10-04 13:56:28 +03:00
Kamil Braun	c200ae2228	Merge 'test.py topology Scylla REST API client' from Alecco - Separate `aiohttp` client code - Helper to access Scylla server REST API - Use helper both in `ScyllaClusterManager` (test.py process) and `ManagerClient` (pytest process) - Add `removenode` and `decommission` operations. Closes #11653 * github.com:scylladb/scylladb: test.py: Scylla REST methods for topology tests test.py: rename server_id to server_ip test.py: HTTP client helper test.py: topology pass ManagerClient instead of... test.py: delete unimplemented remove server test.py: fix variable name ssl name clash	2022-10-04 11:50:18 +02:00
Botond Dénes	169a8a66f2	compatible_ring_position_or_view: make it cheap to copy This class exists for one purpose only: to serve as glue code between dht::ring_position and boost::icl::interval_map. The latter requires that keys in its intervals are: * default constructible * copyable * have standalone compare operations For this reason we have to wrap `dht::ring_position` in a class, together with a schema to provide all this. This is `compatible_ring_position`. There is one further requirement by code using the interval map: it wants to do lookups without copying the lookup key(s). To solve this, we came up with `compatible_ring_position_or_view` which is a union of a key or a key view + schema. As we recently found out, boost::icl copies its keys a lot. It seems to assume these keys are cheap to copy and carelessly copies them around even when iterating over the map. But `compatible_ring_position_or_view` is not cheap to copy as it copies a `dht::ring_position` which allocates, and it does that via an `std::optional` and `std::variant` to add insult to injury. This patch make said class cheap to copy, by getting rid of the variant and storing the `dht::ring_position` via a shared pointer. The view is stored separately and either points to the ring position stored in the shared pointer or to an outside ring position (for lookups). Fixes: #11669 Closes #11670	2022-10-04 12:00:21 +03:00
Piotr Dulikowski	51f813d89b	storage_proxy: update rate limited reads metric when coordinator rejects The decision to reject a read operation can either be made by replicas, or by the coordinator. In the second case, the scylla_storage_proxy_coordinator_read_rate_limited metric was not incremented, but it should. This commit fixes the issue. Fixes: #11651 Closes #11694	2022-10-04 10:33:58 +03:00
Pavel Emelyanov	9cd1f777a5	database.hh: Remove unused headers Use forward declarations when needed Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11667	2022-10-04 09:01:38 +03:00
Botond Dénes	5fd4b1274e	Merge 'compaction_manager: Don't let ENOSPC throw out of ::stop() method' from Pavel Emelyanov The seastar defer_stop() helper is cool, but it forwards any exception from the .stop() towards the caller. In case the caller is main() the exception causes Scylla to abort(). This fires, for example, in compaction_manager::stop() when it steps on ENOSPC Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11662 * github.com:scylladb/scylladb: compaction_manager: Swallow ENOSPCs in ::stop() exceptions: Mark storage_io_error::code() with noexcept	2022-10-04 08:54:22 +03:00
Nadav Har'El	3a30fbd56c	test/alternator: fix timeout in flaky test test_ttl_stats The test `test_metrics.py::test_ttl_stats` tests the metrics associated with Alternator TTL expiration events. It normally finishes in less than a second (the TTL scanning is configured to run every 0.5 seconds), so we arbitrarily set a 60 second timeout for this test to allow for extremely slow test machines. But in some extreme cases even this was not enough - in one case we measured the TTL scan to take 63 seconds. So in this patch we increase the timeout in this test from 60 seconds to 120 seconds. We already did the same change in other Alternator TTL tests in the past - in commit `746c4bd`. Fixes #11695 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11696	2022-10-04 08:50:51 +03:00
Benny Halevy	46ebffcc93	db/large_data_handler: cql_table_large_data_handler: record large_collections When the large_collection_detection cluster feature is enabled, select the internal_record_large_cells_and_collections method to record the large collection cell, storing also the collection_elements column. We want to do that only when the cluster feature is enabled to facilitate rollback in case rolling upgrade is aborted, otherwise system.large_cells won't be backward compatible and will have to be deleted manually. Delete the sstable from system.large_cells if it contains elements_in_collection above threshold. Closes #11449 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:10 +03:00
Benny Halevy	3f8bba202f	db/large_data_handler: pass ref to feature_service to cql_table_large_data_handler For recording collection_elements of large_collections when the large_collection_detection feature is enabled. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:10 +03:00
Benny Halevy	dc4e7d8e01	db/large_data_handler: cql_table_large_data_handler: move ctor out of line Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:09 +03:00
Benny Halevy	f4c3070002	docs: large-rows-large-cells-tables: fix typos Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:09 +03:00
Benny Halevy	2f49eebb04	db/system_keyspace: add collection_elements column to system.large_cells And bump the schema version offset since the new schema should be distinguishable from the previous one. Refs scylladb/scylladb#11660 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:08 +03:00
Benny Halevy	9ad41c700e	gms/feature_service: add large_collection_detection cluster feature And a corresponding db::schema_feature::SCYLLA_LARGE_COLLECTIONS We want to enable the schema change supporting collection_elements only when all nodes are upgraded so that we can roll back if the rolling upgrade process is aborted. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:07 +03:00
Benny Halevy	9eeb8f2971	test: sstable_3_x_test: add test_sstable_too_many_collection_elements Test that collections with too many elements are detected properly. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:07 +03:00
Benny Halevy	3c11937b00	test: lib: simple_schema: add support for optional collection column Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:06 +03:00
Benny Halevy	7b5f2d2e53	test: lib: simple_schema: build schema in ctor body Rather when initializing _s. Prepare for adding an optional collection column. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:06 +03:00
Benny Halevy	db01641a44	test: lib: simple_schema: cql: define s1 as static only if built this way Keep the with_static ctor parameter as private member to be used by the cql() method to define s1 either as static or not. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:05 +03:00
Benny Halevy	6dadca2648	db/large_data_handler: maybe_record_large_cells: consider collection_elements Detect large_collections when the number of collection_elements is above the configured threshold. Next step would be to record the number of collection_elements in the system.large_cells table, when the respective cluster feature is enabled. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:05 +03:00
Benny Halevy	27ee75c54e	db/large_data_handler: debug cql_table_large_data_handler::delete_large_data_entries Log in debug level when deleting large data entry from system table. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:42:04 +03:00
Benny Halevy	7dead10742	sstables: mx/writer: pass collection_elements to writer::maybe_record_large_cells And update the sstable elements_in_collection stats entry. Next step would be to forward it to large_data_handler().maybe_record_large_cells(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:41:58 +03:00
Benny Halevy	54ab038825	sstables: mx/writer: add large_data_type::elements_in_collection Add a new large_data_stats type and entry for keeping the collection_elements_count_threshold and the maximum value of collection_elements. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:41:56 +03:00
Benny Halevy	a107f583fd	db/large_data_handler: get the collection_elements_count_threshold Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:31:11 +03:00
Benny Halevy	167ec84eeb	db/config: add compaction_collection_elements_count_warning_threshold Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:31:10 +03:00
Benny Halevy	5e88e6267e	test: sstable_3_x_test: add test_sstable_write_large_cell based on cell size threshold. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:31:09 +03:00
Benny Halevy	3980415d97	test: sstable_3_x_test: pass cell_threshold_bytes to large_data_handler Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:31:09 +03:00
Benny Halevy	3eb4cda8ea	test: sstable_3_x_test: large_data_handler: prepare callback for testing large_cells Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:31:08 +03:00
Benny Halevy	0a9d3f24e6	test: sstable_3_x_test: large_data tests: use BOOST_REQUIRE_[GL]T This way, the boost infrastructure prints the offending values if the test assertion fails. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:31:07 +03:00
Benny Halevy	9668dd0e2d	test: sstable_3_x_test: test_sstable_log_too_many_rows: use tests::random So it would be reproducible based on the test random-seed Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-04 08:30:51 +03:00
Kamil Braun	114419d6ab	service/raft: raft_group0_client: read on-disk an in-memory group0 upgrade atomically `set_group0_upgrade_state` writes the on-disk state first, then in-memory state second, both under a write lock. `get_group0_upgrade_state` would only take the lock if the in-memory state was `use_pre_raft_procedures`. If there's an external observer who watches the on-disk state to decide whether Raft upgrade finished yet, the following could happen: 1. The node wrote `use_post_raft_procedures` to disk but didn't update the in-memory state yet, which is still `synchronize`. 2. The external client reads the table and sees that the state is `use_post_raft_procedures`, and deduces that upgrade has finished. 3. The external client immediately tries to perform a schema change. The schema change code calls `get_group0_upgrade_state` which does not take the read lock and returns `synchronize`. The schema change gets denied because schema changes are not allowed in `synchronize`. Make sure that `get_group0_upgrade_state` cannot execute in-between writing to disk and updating the in-memory state by always taking the read lock before reading the in-memory state. As it was before, it will immediately drop the lock if the state is not `use_pre_raft_procedures`. This is useful for upgrade tests, which read the on-disk state to decide whether upgrade has finished and often try to perform a schema change immediately afterwards. Closes #11672	2022-10-03 19:04:16 +02:00
Alejo Sanchez	abf1425ad4	test.py: Scylla REST methods for topology tests Provide a helper client for Scylla REST requests. Use it on both ScyllaClusterManager (e.g. remove node, test.py process) and ManagerClient (e.g. get uuid, pytest process). For now keep using IPs as key in ScyllaCluster, but this will be changed to UUID -> IP in the future. So, for now, pass both independently. Note the UUID must be obtained from the server before stopping it. Refresh client driver connection when decommissioning or removing a node. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-10-03 19:01:03 +02:00
Alejo Sanchez	86c752c2a0	test.py: rename server_id to server_ip In ScyllaCluster currently servers are tracked by the host IP. This is not the host id (UUID). Fix the variable name accordingly Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-10-03 19:01:03 +02:00
Alejo Sanchez	a7a0b446f0	test.py: HTTP client helper Split aiohttp client to a shared helper file. While there, move aiohttp session setup back to constructors. When there were teardown issues it looked it could be caused by aiohttp session being created outside a coroutine. But this is proven not to be the case after recent fixes. So move it back to the ManagerClient constructor. On th other hand, create a close() coroutine to stop the aiohttp session. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-10-03 19:01:03 +02:00
Alejo Sanchez	41dbdf0f70	test.py: topology pass ManagerClient instead of... cql connection When there are topology changes, the driver needs to be updated. Instead of passing the CassandraCluster.Connection, pass the ManagerClient instance which manages the driver connection inside of it. Remove workaround for test_raft_upgrade. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-10-03 19:00:47 +02:00
Alejo Sanchez	0c3a06d0d7	test.py: delete unimplemented remove server Delete of Unused and unimplemented broken version of remove server. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-10-03 18:57:38 +02:00
Alejo Sanchez	98bc4c198f	test.py: fix variable name ssl name clash Change variable ssl to use_ssl to avoid clash with ssl module. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-10-03 18:57:38 +02:00
Avi Kivity	7626fd573a	storage_proxy: reindent after coroutinization	2022-10-03 19:33:39 +03:00
Avi Kivity	019b18b232	storage_proxy: convert handle_read_digest() to a coroutine The do_with() makes it at least a break-even, but there's some allocating continuations that make it a win. A variable named cmd had two different definitions (a value and a lw_shared_ptr) that lived in different scopes. I renamed one to cmd1 to disambiguate. We should probably move that to the caller, but that is not done here.	2022-10-03 19:33:39 +03:00
Avi Kivity	aa5f4bf1f3	storage_proxy: convert handle_read_mutation_data() to a coroutine The do_with() makes it at least a break-even, but there's some allocating continuations that make it a win. A variable named cmd had two different definitions (a value and a lw_shared_ptr) that lived in different scopes. I renamed one to cmd1 to disambiguate. We should probably move that to the caller, but that is not done here.	2022-10-03 19:33:39 +03:00
Avi Kivity	bcd134e9b8	storage_proxy: convert handle_read_data() to a coroutine The do_with() makes it at least a break-even, but there's some allocating continuations that make it a win. A variable named cmd had two different definitions (a value and a lw_shared_ptr) that lived in different scopes. I renamed one to cmd1 to disambiguate. We should probably move that to the caller, but that is not done here.	2022-10-03 19:33:39 +03:00
Avi Kivity	167c8b1b5e	storage_proxy: convert handle_write() to a coroutine A do_with() makes this at least a break-even. Some internal lambdas were not converted since they commonly do not allocate or block. A finally() continuation is converted to seastar::defer().	2022-10-03 19:33:39 +03:00
Avi Kivity	741d6609a5	storage_proxy: convert handle_counter_mutation() to a coroutine The do_with means the coroutine conversion is free, and conversion of parallel_for_each to coroutine::parallel_for_each saves a possible allocation (though it would not have been allocated usually. An inner continuation is not converted since it usually doesn't block, and therefore doesn't allocate.	2022-10-03 19:33:39 +03:00
Avi Kivity	ac5fae4b93	storage_proxy: convert query_nonsingular_mutations_locally() to a coroutine It's simpler, and the do_with() allocation + task cancels out the coroutine allocation + task.	2022-10-03 19:33:29 +03:00
Pavel Emelyanov	d22b130af1	compaction_manager: Swallow ENOSPCs in ::stop() When being stopped compaction manager may step on ENOSPC. This is not a reason to fail stopping process with abort, better to warn this fact in logs and proceed as if nothing happened refs: #11245 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-03 18:54:48 +03:00
Pavel Emelyanov	7ba1f551f3	exceptions: Mark storage_io_error::code() with noexcept Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-03 18:50:06 +03:00
Kamil Braun	67ee6500e3	service/raft: raft_group_registry: pass `direct_fd_pinger` by reference It was passed to `raft_group_registry::direct_fd_proxy` by value. That is a bug, we want to pass a reference to the instance that is living inside `gossiper`. Fortunately this bug didn't cause problems, because the pinger is only used for one function, `get_address`, which looks up an address in a map and if it doesn't find it, accesses the map that lives inside `gossiper` on shard 0 (and then caches it in the local copy). Explicitly delete the copy constructor of `direct_fd_pinger` so this doesn't happen again. Closes #11661	2022-10-03 16:40:35 +02:00
Tomasz Grabiec	9dae2b9c02	Merge 'mutation_fragment_stream_validator: various API improvements' from Botond Dénes The low-level `mutation_fragment_stream_validator` gets `reset()` methods that until now only the high-level `mutation_fragment_stream_validating_filter` had. Active tombstone validation is pushed down to the low level validator. The low level validator, which was a pain to use until now due to being very fussy on which subset of its API one used, is made much more robust, not requiring the user to stick to a subset of its API anymore. Closes #11614 * github.com:scylladb/scylladb: mutation_fragment_stream_validator: make interface more robust mutation_fragment_stream_validator: add reset() to validating filter mutation_fragment_stream_validator: move active tomsbtone validation into low level validator	2022-10-03 16:23:46 +02:00
Botond Dénes	95f31f37c1	Merge 'dirty_memory_manager: simplify region_group' from Avi Kivity region_group evolved as a tree, each node of which contains some regions (memtables). Each node has some constraints on memory, and can start flushing and/or stop allocation into its memtables and those below it when those constraints are violated. Today, the tree has exactly two nodes, only one of which can hold memtables. However, all the complexity of the tree remains. This series applies some mechanical code transformations that remove the tree structure and all the excess functionality, leaving a much simpler structure behind. Before: - a tree of region_group objects - each with two parameters: soft limit and hard limit - but only two instances ever instantiated After: - a single region_group object - with three parameters - two from the bottom instance, one from the top instance Closes #11570 * github.com:scylladb/scylladb: dirty_memory_manager: move third memory threshold parameter of region_group constructor to reclaim_config dirty_memory_manager: simplify region_group::update() dirty_memory_manager: fold region_group::notify_hard_pressure_relieved into its callers dirty_memory_manager: clean up region_group::do_update_hard_and_check_relief() dirty_memory_manager: make do_update_hard_and_check_relief() a member of region_group dirty_memory_manager: remove accessors around region_group::_under_hard_pressure dirty_memory_manager: merge memory_hard_limit into region_group dirty_memory_manager: rename members in memory_hard_limit dirty_memory_manager: fold do_update() into region_group::update() dirty_memory_manager: simplify memory_hard_limit's do_update dirty_memory_manager: drop soft limit / soft pressure members in memory_hard_limit dirty_memory_manager: de-template do_update(region_group_or_memory_hard_limit) dirty_memory_manager: adjust soft_limit threshold check dirty_memory_manager: drop memory_hard_limit::_name dirty_memory_manager: simplify memory_hard_limit configuration dirty_memory_manager: fold region_group_reclaimer into {memory_hard_limit,region_group} dirty_memory_manager: stop inheriting from region_group_reclaimer dirty_memory_manager: test: unwrap region_group_reclaimer dirty_memory_manager: change region_group_reclaimer configuration to a struct dirty_memory_manager: convert region_group_reclaimer to callbacks dirty_memory_manager: consolidate region_group_reclaimer constructors dirty_memory_manager: rename {memory_hard_limit,region_group}::notify_relief dirty_memory_manager: drop unused parameter to memory_hard_limit constructor dirty_memory_manager: drop memory_hard_limit::shutdown() dirty_memory_manager: split region_group hierarchy into separate classes dirty_memory_manager: extract code block from region_group::update dirty_memory_manager: move more allocation_queue functions out of region_group dirty_memory_manager: move some allocation queue related function definitions outside class scope dirty_memory_manager: move region_group::allocating_function and related classes to new class allocation_queue dirty_memory_manager: remove support for multiple subgroups	2022-10-03 13:22:47 +03:00
Anna Stuchlik	3950a1cac8	doc: apply the feedback to improve clarity	2022-10-03 11:14:51 +02:00
Botond Dénes	5621cdd7f9	db/view/view_builder: don't drop partition and range tombstones when resuming The view builder builds the views from a given base table in view_builder::batch_size batches of rows. After processing this many rows, it suspends so the view builder can switch to building views for other base tables in the name of fairness. When resuming the build step for a given base table, it reuses the reader used previously (also serving the role of a snapshot, pinning sstables read from). The compactor however is created anew. As the reader can be in the middle of a partition, the view builder injects a partition start into the compactor to prime it for continuing the partition. This however only included the partition-key, crucially missing any active tombstones: partition tombstone or -- since the v2 transition -- active range tombstone. This can result in base rows covered by either of this to be resurrected and the view builder to generate view updates for them. This patch solves this by using the detach-state mechanism of the compactor which was explicitly developed for situations like this (in the range scan code) -- resuming a read with the readers kept but the compactor recreated. Also included are two test cases reproducing the problem, one with a range tombstone, the other with a partition tombstone. Fixes: #11668 Closes #11671	2022-10-03 11:28:22 +03:00
Avi Kivity	2c744628ae	Update abseil submodule * abseil 9e408e05...7f3c0d78 (193): > Allows absl::StrCat to accept types that implement AbslStringify() > Merge pull request #1283 from pateldeev:any_inovcable_rename_true > Cleanup: SmallMemmove nullify should also be limited to 15 bytes > Cleanup: implement PrependArray and PrependPrecise in terms of InlineData > Cleanup: Move BitwiseCompare() to InlineData, and make it layout independent. > Change kPower10Table bounds to be half-open > Cleanup some InlineData internal layout specific details from cord.h > Improve the comments on the implementation of format hooks adl tricks. > Expand LogEntry method docs. > Documentation: Remove an obsolete note about the implementation of `Cord`. > `absl::base_internal::ReadLongFromFile` should use `O_CLOEXEC` and handle interrupts to `read` > Allows absl::StrFormat to accept types which implement AbslStringify() > Add common_policy_traits - a subset of hash_policy_traits that can be shared between raw_hash_set and btree. > Split configuration related to cycle clock into separate headers > Fix -Wimplicit-int-conversion and -Wsign-conversion warnings in btree. > Implement Eisel-Lemire for from_chars<float> > Import of CCTZ from GitHub. > Adds support for "%v" in absl::StrFormat and related functions for bool values. Note that %v prints bool values as "true" and "false" rather than "1" and "0". > De-pointerize LogStreamer::stream_, and fix move ctor/assign preservation of flags and other stream properties. > Explicitly disallows modifiers for use with %v. > Change the macro ABSL_IS_TRIVIALLY_RELOCATABLE into a type trait - absl::is_trivially_relocatable - and move it from optimization.h to type_traits.h. > Add sparse and string copy constructor benchmarks for hash table. > Make BTrees work with custom allocators that recycle memory. > Update the readme, and (internally) fix some export processes to better keep it up-to-date going forward. > Add the fact that CHECK_OK exits the program to the comment of CHECK_OK. > Adds support for "%v" in absl::StrFormat and related functions for numeric types, including integer and floating point values. Users may now specify %v and have the format specifier deduced. Integer values will print according to %d specifications, unsigned values will use %u, and floating point values will use %g. Note that %v does not work for `char` due to ambiguity regarding the intended output. Please continue to use %c for `char`. > Implement correct move constructor and assignment for absl::strings_internal::OStringStream, and mark that class final. > Add more options for `BM_iteration` in order to see better picture for choosing trade off for iteration optimizations. > Change `EndComparison` benchmark to not measure iteration. Also added `BM_Iteration` separately. > Implement Eisel-Lemire for from_chars<double> > Add `-llog` to linker options when building log_sink_set in logging internals. > Apply clang-format to btree.h. > Improve failure message: tell the values we don't like. > Increase the number of per-ObjFile program headers we can expect. > Fix "unsafe narrowing" warnings in absl, 8/n. > Fix format string error with an explicit cast > Add a case to detect when the Bazel compiler string is explicitly set to "gcc", instead of just detecting Bazel's default "compiler" string. > Fix "unsafe narrowing" warnings in absl, 10/n. > Fix "unsafe narrowing" warnings in absl, 9/n. > Fix stacktrace header includes > Add a missing dependency on :raw_logging_internal > CMake: Require at least CMake 3.10 > CMake: install artifacts reflect the compiled ABI > Fixes bug so that `%v` with modifiers doesn't compile. `%v` is not intended to work with modifiers because the meaning of modifiers is type-dependent and `%v` is intended to be used in situations where the type is not important. Please continue using if `%s` if you require format modifiers. > Convert algorithm and container benchmarks to cc_binary > Merge pull request #1269 from isuruf:patch-1 > InlinedVector: Small improvement to the max_size() calculation > CMake: Mark hash_testing as a public testonly library, as it is with Bazel > Remove the ABSL_HAVE_INTRINSIC_INT128 test from pcg_engine.h > Fix ClangTidy warnings in btree.h and btree_test.cc. > Fix log StrippingTest on windows when TCHAR = WCHAR > Refactors checker.h and replaces recursive functions with iterative functions for readability purposes. > Refactors checker.h to use if statements instead of ternary operators for better readability. > Import of CCTZ from GitHub. > Workaround for ASAN stack safety analysis problem with FixedArray container annotations. > Rollback of fix "unsafe narrowing" warnings in absl, 8/n. > Fix "unsafe narrowing" warnings in absl, 8/n. > Changes mutex profiling > InlinedVector: Correct the computation of max_size() > Adds support for "%v" in absl::StrFormat and related functions for string-like types (support for other builtin types will follow in future changes). Rather than specifying %s for strings, users may specify %v and have the format specifier deduced. Notably, %v does not work for `const char` because we cannot be certain if %s or %p was intended (nor can we be certain if the `const char` was properly null-terminated). If you have a `const char` you know is null-terminated and would like to work with %v, please wrap it in a `string_view` before using it. > Fixed header guards to match style guide conventions. > Typo fix > Added some more no_test.. tags to build targets for controlling testing. > Remove includes which are not used directly. > CMake: Add an option to build the libraries that are used for writing tests without requiring Abseil's tests be built (default=OFF) > Fix "unsafe narrowing" warnings in absl, 7/n. > Fix "unsafe narrowing" warnings in absl, 6/n. > Release the Abseil Logging library > Switch time_state to explicit default initialization instead of value initialization. > spinlock.h: Clean up includes > Fix minor typo in absl/time/time.h comment: "ToDoubleNanoSeconds" -> "ToDoubleNanoseconds" > Support compilers that are unknown to CMake > Import of CCTZ from GitHub. > Change bit_width(T) to return int rather than T. > Import of CCTZ from GitHub. > Merge pull request #1252 from jwest591:conan-fix > Don't try to enable use of ARM NEON intrinsics when compiling in CUDA device mode. They are not available in that configuration, even if the host supports them. > Fix "unsafe narrowing" warnings in absl, 5/n. > Fix "unsafe narrowing" warnings in absl, 4/n. > Import of CCTZ from GitHub. > Update Abseil platform support policy to point to the Foundational C++ Support Policy > Import of CCTZ from GitHub. > Add --features=external_include_paths to Bazel CI to ignore warnings from dependencies > Merge pull request #1250 from jonathan-conder-sm:gcc_72 > Merge pull request #1249 from evanacox:master > Import of CCTZ from GitHub. > Merge pull request #1246 from wxilas21:master > remove unused includes and add missing std includes for absl/status/status.h > Sort INTERNAL_DLL_TARGETS for easier maintenance. > Disable ABSL_HAVE_STD_IS_TRIVIALLY_ASSIGNABLE for clang-cl. > Map the absl::is_trivially_ functions to their std impl > Add more SimpleAtod / SimpleAtof test coverage > debugging: handle alternate signal stacks better on RISCV > Revert change "Fix "unsafe narrowing" warnings in absl, 4/n.". > Fix "unsafe narrowing" warnings in absl, 3/n. > Fix "unsafe narrowing" warnings in absl, 4/n. > Fix "unsafe narrowing" warnings in absl, 2/n. > debugging: honour `STRICT_UNWINDING` in RISCV path > Fix "unsafe narrowing" warnings in absl, 1/n. > Add ABSL_IS_TRIVIALLY_RELOCATABLE and ABSL_ATTRIBUTE_TRIVIAL_ABI macros for use with clang's __is_trivially_relocatable and [[clang::trivial_abi]]. > Merge pull request #1223 from ElijahPepe:fix/implement-snprintf-safely > Fix frame pointer alignment check. > Fixed sign-conversion warning in code. > Import of CCTZ from GitHub. > Add missing include for std::unique_ptr > Do not re-close files on EINTR > Renamespace absl::raw_logging_internal to absl::raw_log_internal to match (upcoming) non-raw logging namespace. > Check for negative return values from ReadFromOffset > Use HTTPS RFC URLs, which work regardless of the browser's locale. > Avoid signedness change when casting off_t > Internal Cleanup: removing unused internal function declaration. > Make Span complain if constructed with a parameter that won't outlive it, except if that parameter is also a span or appears to be a view type. > any_invocable_test: Re-enable the two conversion tests that used to fail under MSVC > Add GetCustomAppendBuffer method to absl::Cord > debugging: add hooks for checking stack ranges > Minor clang-tidy cleanups > Support [[gnu::abi_tag("xyz")]] demangling. > Fix -Warray-parameter warning > Merge pull request #1217 from anpol:macos-sigaltstack > Undo documentation change on erase. > Improve documentation on erase. > Merge pull request #1216 from brjsp:master > string_view: conditional constexpr is no longer needed for C++14 > Make exponential_distribution_test a bigger test (timeout small -> moderate). > Move Abseil to C++14 minimum > Revert commit f4988f5bd4176345aad2a525e24d5fd11b3c97ea > Disable C++11 testing, enable C++14 and C++20 in some configurations where it wasn't enabled > debugging: account for differences in alternate signal stacks > Import of CCTZ from GitHub. > Run flaky test in fewer configurations > AnyInvocable: Move credits to the top of the file > Extend visibility of :examine_stack to an upcoming Abseil Log. > Merge contiguous mappings from the same file. > Update versions of WORKSPACE dependencies > Use ABSL_INTERNAL_HAS_SSE2 instead of __SSE2__ > PR #1200: absl/debugging/CMakeLists.txt: link with libexecinfo if needed > Update GCC floor container to use Bazel 5.2.0 > Update GoogleTest version used by Abseil > Release absl::AnyInvocable > PR #1197: absl/base/internal/direct_mmap.h: fix musl build on mips > absl/base/internal/invoke: Ignore bogus warnings on GCC >= 11 > Revert GoogleTest version used by Abseil to commit 28e1da21d8d677bc98f12ccc7fc159ff19e8e817 > Update GoogleTest version used by Abseil > explicit_seed_seq_test: work around/disable bogus warnings in GCC 12 > any_test: expand the any emplace bug suppression, since it has gotten worse in GCC 12 > absl::Time: work around bogus GCC 12 -Wrestrict warning > Make absl::StdSeedSeq an alias for std::seed_seq > absl::Optional: suppress bogus -Wmaybe-uninitialized GCC 12 warning > algorithm_test: suppress bogus -Wnonnull warning in GCC 12 > flags/marshalling_test: work around bogus GCC 12 -Wmaybe-uninitialized warning > counting_allocator: suppress bogus -Wuse-after-free warning in GCC 12 > Prefer to fallback to UTC when the embedded zoneinfo data does not contain the requested zone. > Minor wording fix in the comment for ConsumeSuffix() > Tweak the signature of status_internal::MakeCheckFailString as part of an upcoming change > Fix several typos in comments. > Reformulate documentation of ABSL_LOCKS_EXCLUDED. > absl/base/internal/invoke.h: Use ABSL_INTERNAL_CPLUSPLUS_LANG for language version guard > Fix C++17 constexpr storage deprecation warnings > Optimize SwissMap iteration by another 5-10% for ARM > Add documentation on optional flags to the flags library overview. > absl: correct the stack trace path on RISCV > Merge pull request #1194 from jwnimmer-tri:default-linkopts > Remove unintended defines from config.h > Ignore invalid TZ settings in tests > Add ABSL_HARDENING_ASSERTs to CordBuffer::SetLength() and CordBuffer::IncreaseLengthBy() > Fix comment typo about absl::Status<T*> > In b-tree, support unassignable value types. > Optimize SwissMap for ARM by 3-8% for all operations > Release absl::CordBuffer > InlinedVector: Limit the scope of the maybe-uninitialized warning suppression > Improve the compiler error by removing some noise from it. The "deleted" overload error is useless to users. By passing some dummy string to the base class constructor we use a valid constructor and remove the unintended use of the deleted default constructor. > Merge pull request #714 from kgotlinux:patch-2 > Include proper #includes for POSIX thread identity implementation when using that implementation on MinGW. > Rework NonsecureURBGBase seed sequence. > Disable tests on some platforms where they currently fail. > Fixed typo in a comment. > Rollforward of commit ea78ded7a5f999f19a12b71f5a4988f6f819f64f. > Add an internal helper for logging (upcoming). > Merge pull request #1187 from trofi:fix-gcc-13-build > Merge pull request #1189 from renau:master > Allow for using b-tree with `value_type`s that can only be constructed by the allocator (ignoring copy/move constructors). > Stop using sleep timeouts for Linux futex-based SpinLock > Automated rollback of commit f2463433d6c073381df2d9ca8c3d8f53e5ae1362. > time.h: Use uint32_t literals for calls to overloaded MakeDuration > Fix typos. > Clarify the behaviour of `AssertHeld` and `AssertReaderHeld` when the calling thread doesn't hold the mutex. > Enable __thread on Asylo > Add implementation of is_invocable_r to absl::base_internal for C++ < 17, define it as alias of std::is_invocable_r when C++ >= 17 > Optimize SwissMap iteration for aarch64 by 5-6% > Fix detection of ABSL_HAVE_ELF_MEM_IMAGE on Haiku > Don’t use generator expression to build .pc Libs lines > Update Bazel used on MacOS CI > Import of CCTZ from GitHub. Closes #11687	2022-10-03 11:06:37 +03:00
Botond Dénes	f4540ef0d6	Merge 'Upgrade nix devenv' from Michael Livshin To recap: the Nix devenv ({default,shell,flake}.nix and friends) in Scylla is a nicer (for those who consider it so, that is) alternative to dbuild: a completely deterministic build environment without Docker. In theory we could support much more (creating installable packages, container images, various deployment affordances, etc. -- Nix is, among other things, a kind of parallel-to-everything-else devops realm) but there is clearly no demand and besides duplicating the work the release team is already doing (and doing just fine, needless to say) would be pointless and wasteful. This PR reflects the accumulated changes that I have been carrying locally for the past year or so. The version currently in master _probably_ can still build Scylla, but that Scylla certainly would not pass unit tests. What the previous paragraph seems to mean is, apparently I'm the only active user of Nix devenv for Scylla. Which, in turn, presents some obvious questions for the maintainers: - Does this need to live in the Scylla source at all? (The changes to non-Nix-specific parts are minimal and unobtrusive, but they are still changes) - If it's left in, who is going to maintain it going forward, should more users somehow appear? (I'm perfectly willing to fix things up when alerted, but no timeliness guarantees) Closes #9557 * github.com:scylladb/scylladb: nix: add README.md build: improvements & upgrades to Nix dev environment build: allow setting SCYLLA_RELEASE from outside	2022-10-03 09:40:09 +03:00
Botond Dénes	2041744132	Merge 'readers/mutlishard: don't mix coroutines and continuations in the do_fill_buffer()' from Avi Kivity The combination is hard to read and modify. Closes #11665 * github.com:scylladb/scylladb: readers/multishard: restore shard_reader_v2::do_fill_buffer() indentation readers/multishard: convert shard_reader_v2::do_fill_buffer() to a pure coroutine	2022-10-03 06:51:20 +03:00
Nadav Har'El	b8f8eb8710	Merge 'Improve test.py logging' from Kamil Braun Include the unique test name (the unique name distinguishes between different test repeats) and the test case name where possible. Improve printing of clusters: include the cluster name and stopped servers. Fix some logging calls and add new ones. Examples: ``` ------ Starting test test_topology ------ ``` became this: ``` ------ Starting test test_topology.1::test_add_server_add_column ------ ``` This: ``` INFO> Leasing Scylla cluster {127.191.142.1, 127.191.142.2, 127.191.142.3} for test test_add_server_add_column ``` became this: ``` INFO> Leasing Scylla cluster ScyllaCluster(name: 02cdd180-40d1-11ed-8803-3c2c30d32d96, running: {127.144.164.1, 127.144.164.2, 127.144.164.3}, stopped: {}) for test test_topology.1::test_add_server_add_column ``` Closes #11677 * github.com:scylladb/scylladb: test/pylib: scylla_cluster: improve cluster printing test/pylib: don't pass test_case_name to after-test endpoint test/pylib: scylla_cluster: track current test case name and print it test.py: pass the unique test name (e.g. `test_topology.1`) to cluster manager test/pylib: scylla_cluster: pass the test case name to `before_test` test/pylib: use "test_case_name" variable name when talking about test cases	2022-10-02 20:48:50 +03:00
Pavel Emelyanov	2b8636a2a9	storage_proxy.hh: Remove unused headers Add needed forward declarations and fix indirect inclusions in some .ccs Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11679	2022-10-02 20:48:50 +03:00
Michał Chojnowski	4563cbe595	logalloc: prevent false positives in reclaim_timer reclaim_timer uses a coarse clock, but does not account for the measurement error introduced by that -- it can falsely report reclaims as stalls, even if they are shorter by a full coarse clock tick from the requested threshold (blocked-reactor-notify-ms). Notably, if the stall threshold happens to be smaller or equal to coarse clock resolution, Scylla's log gets spammed with false stall reports. The resolution of coarse clocks in Linux is 1/CONFIG_HZ. This is typically equal to 1 ms or 4 ms, and stall thresholds of this order can occur in practice. Eliminate false positives by requiring the measured reclaim duration to be at least 1 clock tick longer than the configured threshold for it to be considered a stall. Fixes #10981 Closes #11680	2022-10-02 13:41:40 +03:00
Avi Kivity	372eadf542	Merge "perftune related improvements in scylla_* scripts" from Vlad Zolotarov " This series adds a long waited transition of our auto-generation code to irq_cpu_mask instead of 'mode' in perftune.yaml. And then it fixes a regression in scylla_prepare perftune.yaml auto-generation logic. " * 'scylla_prepare_fix_regression-v1' of https://github.com/vladzcloudius/scylla: scylla_prepare + scylla_cpuset_setup: make scylla_cpuset_setup idempotent without introducing regressions scylla_prepare: stop generating 'mode' value in perftune.yaml	2022-10-02 13:25:13 +03:00
Michael Livshin	d178ac17dc	nix: add README.md Signed-off-by: Michael Livshin <repo@cmm.kakpryg.net>	2022-10-02 12:26:02 +03:00
Michael Livshin	7bd13be3f2	build: improvements & upgrades to Nix dev environment * Add some more useful stuff to the shell environment, so it actually works for debugging & post-mortem analysis. * Wrap ccache & distcc transparently (distcc will be used unless NODISTCC is set to a non-empty value in the environment; ccache will be used if CCACHE_DIR is not empty). * Package the Scylla Python driver (instead of the C* one). * Catch up to misc build/test requirements (including optional) by requiring or custom-packaging: wasmtime 0.29.0, cxxbridge, pytest-asyncio, liburing. * Build statically-linked zstd in a saner and more idiomatic fashion. * In pure builds (where sources lack Git metadata), derive SCYLLA_RELEASE from source hash. * Refactor things for more parameterization. * Explicitly stub out installPhase (seeing that "nix build" succeeds up to installPhase means we didn't miss any dependencies). * Add flake support. * Add copious comments. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-10-02 11:47:16 +03:00
Michael Livshin	839d8f40e6	build: allow setting SCYLLA_RELEASE from outside The extant logic for deriving the value of SCYLLA_RELEASE from the source tree has those assumptions: * The tree being built includes Git metadata. * The value of `date` is trustworthy and interesting. * There are no uncommitted changes (those relevant to building, anyway). The above assumptions are either irrelevant or problematic in pure build environments (such as the sandbox set up by `nix-build`): * Pure builds use cleaned-up sources with all timestamps reset to Unix time 0. Those cleaned-up sources are saved (in the Nix store, for example) and content-hashed, so leaving the (possibly huge) Git metadata increases the time to copy the sources and wastes disk space (in fact, Nix in flake mode strips `.git` unconditionally). * Pure builds run in a sandbox where time is, likewise, reset to Unix time 0, so the output of `date` is neither informative nor useful. Now, the only build step that uses Git metadata in the first place is the SCYLLA_RELEASE value derivation logic. So, essentially, answering the question "is the Git metadata needed to build Scylla" is a matter of definition, and is up to us. If we elect to ignore Git metadata and current time, we can derive SCYLLA_RELEASE value from the content hash of the cleaned-up tree, regardless of the way that tree was arrived at. This change makes it possible to skip the derivation of SCYLLA_RELEASE value from Git metadata and current time by way of setting SCYLLA_RELEASE in the environment. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-10-02 11:47:16 +03:00
Avi Kivity	17b1cb4434	dirty_memory_manager: move third memory threshold parameter of region_group constructor to reclaim_config Place it along the other parameters.	2022-09-30 22:17:37 +03:00
Avi Kivity	ecf30ee469	dirty_memory_manager: simplify region_group::update() We notice there are two separate conditions controlling a call to a single outcome, notify_pressure_relief(). Merge them into a single boolean variable.	2022-09-30 22:15:45 +03:00
Avi Kivity	230fff299a	dirty_memory_manager: fold region_group::notify_hard_pressure_relieved into its callers It is trivial.	2022-09-30 22:11:01 +03:00
Avi Kivity	12b81173b9	dirty_memory_manager: clean up region_group::do_update_hard_and_check_relief() Remove synthetic "rg" local.	2022-09-30 22:09:09 +03:00
Avi Kivity	e1bad8e883	dirty_memory_manager: make do_update_hard_and_check_relief() a member of region_group It started life as something shared between memory_hard_limit and region_group, but now that they are back being the same thing, we can make it a member again.	2022-09-30 22:04:26 +03:00
Avi Kivity	6b21c10e9e	dirty_memory_manager: remove accessors around region_group::_under_hard_pressure It is now only accessed from within the class, so the accessors don't help anything.	2022-09-30 21:59:46 +03:00
Avi Kivity	6a02bb7c2b	dirty_memory_manager: merge memory_hard_limit into region_group The two classes always have a 1:1 or 0:1 relationship, and so we can just move all the members of memory_hard_limit into region_group, with the functions that track the relationship (memory_hard_limit::{add,del}()) removed. The 0:1 relationship is maintained by initializing the hard limit parameter with std::numeric_limits<size_t>::max(). The _hard_total_memory variable is always checked if it is greater than this parameter in order to do anything, and with this default it can never be.	2022-09-30 21:59:38 +03:00
Avi Kivity	45ab24e43d	dirty_memory_manager: rename members in memory_hard_limit In preparation for merging memory_hard_limit into region_group, disambiguate similarly named members by adding the word "hard" in random places. memory_hard_limit and region_group are candidates for merging because they constantly reference each other, and memory_hard_limit does very little by itself.	2022-09-30 21:47:33 +03:00
Avi Kivity	aca96c4103	readers/multishard: restore shard_reader_v2::do_fill_buffer() indentation	2022-09-30 19:19:51 +03:00
Avi Kivity	b08196f3b3	readers/multishard: convert shard_reader_v2::do_fill_buffer() to a pure coroutine do_full_buffer() is an eclectic mix of coroutines and continuations. That makes it hard to follow what is running sequentially and concurrently. Convert it into a pure coroutine by changing internal continuations to lambda coroutines. These lambda coroutines are guarded with seastar::coroutine::lambda. Furthermore, a future that is co_awaited is converted to immediate co_await (without an intermediate future), since seastar::coroutine::lambda only works if the coroutine is awaited in the same statement it is defined on.	2022-09-30 19:19:48 +03:00
Kamil Braun	b2cf610567	test/pylib: scylla_cluster: improve cluster printing Print the cluster name and stopped servers in addition to the running servers. Fix a logging call which tried to print a server in place of a cluster and even at that it failed (the server didn't have a hostname yet so it printed as an empty string). Add another logging call.	2022-09-30 17:00:05 +02:00
Kamil Braun	05ed3769dd	test/pylib: don't pass test_case_name to after-test endpoint It's redundant now, the manager tracks the current test case using before-test endpoint calls.	2022-09-30 16:41:45 +02:00
Kamil Braun	dc6f37b7f7	test/pylib: scylla_cluster: track current test case name and print it Use `_before_test` calls to track the current test case name. Concatenate it with the unique test name like this: `test_topology.1::test_add_server_add_column`, and print it instead of the test case name.	2022-09-30 16:38:35 +02:00
Kamil Braun	5be818d73b	test.py: pass the unique test name (e.g. `test_topology.1`) to cluster manager This helps us distinguish the different repeats of a test in logs. Rename the variable accordingly in `ScyllaClusterManager`.	2022-09-30 16:24:10 +02:00
Kamil Braun	fde4642472	test/pylib: scylla_cluster: pass the test case name to `before_test` We pass the test case name to `after_test` - so make it consistent. Arguably, the test case name is more useful (as it's more precise) than the test name.	2022-09-30 16:17:59 +02:00
Kamil Braun	43d8b4a214	test/pylib: use "test_case_name" variable name when talking about test cases Distinguish "test name" (e.g. `test_topology`) from "test case name" (e.g. `test_add_server_add_column` - a test case inside `test_topology`).	2022-09-30 16:15:48 +02:00
Botond Dénes	060dda8e00	Merge 'Reduce dependencies on large data handler header' from Benny Halevy Reduce the false dependencies on db/large_data_handler.hh by not including it from commonly used header files, and rather including it only in the source files that actually need it. The is in preparation for https://github.com/scylladb/scylladb/issues/11449 Closes #11654 * github.com:scylladb/scylladb: test: lib: do not include db/large_data_handler.hh in test_service.hh test: lib: move sstable test_env::impl ctor out of line sstables: do not include db/large_data_handler.hh in sstables.hh api/column_family: add include db/system_keyspace.hh	2022-09-30 13:27:38 +03:00
Tomasz Grabiec	5268f0f837	test: lib: random_mutation_generator: Don't generate mutations with marker uncompacted with shadowable tombstone The generator was first setting the marker then applied tombstones. The marker was set like this: row.marker() = random_row_marker(); Later, when shadowable tombstones were applied, they were compacted with the marker as expected. However, the key for the row was chosen randomly in each iteration and there are multiple keys set, so there was a possibility of a key clash with an earlier row. This could override the marker without applying any tombstones, which is conditional on random choice. This could generate rows with markers uncompacted with shadowable tombstones. This broken row_cache_test::test_concurrent_reads_and_eviction on comparison between expected and read mutations. The latter was compacted because it went through an extra merge path, which compacts the row. Fix by making sure there are no key clashes. Closes #11663	2022-09-30 11:27:01 +03:00
Kamil Braun	1793d43b15	test/pylib: scylla_cluster: mark `server_remove` as not implemented The `server_remove` function did a very weird thing: it shut down a server and made the framework 'forget' about it. From the point of view of the Scylla cluster and the driver the server was still there. Replace the function's body with `raise NotImplementedError`. In the future it can be replaced with an implementation that calls `removenode` on the Scylla cluster. Remove `test_remove_server_add_column` from `test_topology`. It effectively does the same thing as `test_stop_server_add_column`, except that the framework also 'forgets' about the stopped server. This could lead to weird situations because the forgotten server's IP could be reused in another test that was running concurrently with this test. Closes #11657	2022-09-29 21:03:18 +03:00
Pavel Emelyanov	6a5b0d6c70	table: Handle storage_io_error's ENOSPC when flushing Commit `a9805106` (table: seal_active_memtable: handle ENOSPC error) made memtable flushing code stand ENOSPC and continue flusing again in the hope that the node administrator would provide some free space. However, it looks like the IO code may report back ENOSPC with some exception type this code doesn't expect. This patch tries to fix it refs: #11245 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-29 19:16:30 +03:00
Pavel Emelyanov	826244084e	table: Rewrap retry loop The existing loop is very branchy in its attempts to find out whether or not to abort. The "allowed_retries" count can be a good indicator of the decision taken. This makes the code notably shorter and easier to extend Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-29 19:14:46 +03:00
Benny Halevy	776b009c0f	test: lib: do not include db/large_data_handler.hh in test_service.hh It was needed for defining and referencing nop_lp_handler and in sstable_3_x_test for testing the large_data_handler. Remove the include from the commonly used header file to reduce the false dependencies on large_data_handler.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-29 18:36:16 +03:00
Benny Halevy	678d88576b	test: lib: move sstable test_env::impl ctor out of line To prepare for removing the include of db/large_data_handler.hh from test/lib/test_services.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-29 18:35:12 +03:00
Botond Dénes	ad04f200d3	Merge 'database: automatically take snapshot of base table views' from Benny Halevy The logic to reject explicit snapshot of views/indexes was improved in `aa127a2dbb`. However, we never implemented auto-snapshot of view/indexes when taking a snapshot of the base table. This is implemented in this patch. The implementation is built on top of `ba42852b0e` so it would be hard to backport to 5.1 or earlier releases. Fixes #11612 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11616 * github.com:scylladb/scylladb: database: automatically take snapshot of base table views api: storage_service: reject snapshot of views in api layer	2022-09-29 13:33:31 +03:00
Benny Halevy	ae7fd1c7b2	sstables: do not include db/large_data_handler.hh in sstables.hh Reduce dependencies by only forward-declaring class db::large_data_handler in sstables.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-29 12:42:58 +03:00
Benny Halevy	fb7e55b0a8	api/column_family: add include db/system_keyspace.hh For db::system_keyspace::load_view_build_progress that currently indirectly satisfied via sstables/sstables.hh -> db/large_data_handler.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-29 12:42:54 +03:00
Nadav Har'El	4a7794fb64	alternator: better error message when adding a GSI to an existing table Due to issue #11567, Alternator do not yet support adding a GSI to an existing table via UpdateTable with the GlobalSecondaryIndexUpdates parameter. However, currently, we print a misleading error message in this case, complaining about the AttributeDefinitions parameter. This parameter is also required with GlobalSecondaryIndexUpdates, but it's not the main problem, and the user is likely to be confused why the error message points to that specific paramter and what it means that this parameter is claimed to be "not supported" (while it is supported, in CreateTable). With this patch, we report that GlobalSecondaryIndexUpdates is not supported. This patch does not fix the unsupported feature - it just improves the error message saying that it's not supported. Refs #11567 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11650	2022-09-29 09:00:31 +03:00
Asias He	c194c811df	repair: Yield in repair_service::do_decommission_removenode_with_repair When walking through the ranges, we should yield to prevent stalls. We do similar yield in other node operations. Fix a stall in 5.1.dev.20220724.f46b207472a3 with build-id d947aaccafa94647f71c1c79326eb88840c5b6d2 ``` !INFO \| scylla[6551]: Reactor stalled for 10 ms on shard 0. Backtrace: 0x4bbb9d2 0x4bba630 0x4bbb8e0 0x7fd365262a1f 0x2face49 0x2f5caff 0x36ca29f 0x36c89c3 0x4e3a0e1 ```` Fixes #11146 Closes #11160	2022-09-28 18:21:35 +03:00
Avi Kivity	cf3830a249	Merge 'Add support for TRUNCATE USING TIMEOUT' from Benny Halevy Extend the cql3 truncate statement to accept attributes, similar to modification statements. To achieve that we define cql3::statements::raw::truncate_statement derived from raw::cf_statement, and implement its pure virtual prepare() method to make a prepared truncate_statement. The latter is no longer derived from raw::cf_statement, and just stores a schema_ptr to get to the keyspace and column_family. `test_truncate_using_timeout` cql-pytest was added to test the new USING TIMEOUT feature. Fixes #11408 Also, update docs/cql/ddl.rst truncate-statement section respectively. Closes #11409 * github.com:scylladb/scylladb: docs: cql-extensions: add TRUNCATE to USING TIMEOUT section. docs: cql: ddl: add support for TRUNCATE USING TIMEOUT cql3, storage_proxy: add support for TRUNCATE USING TIMEOUT cql3: selectStatement: restrict to USING TIMEOUT in grammar cql3: deleteStatement: restrict to USING TIMEOUT\|TIMESTAMP in grammar	2022-09-28 18:19:03 +03:00
Avi Kivity	19374779bb	Merge 'Fix large data warning and docs' from Benny Halevy The series contains fixes for system.large_* log warning and respective documentation. This prepares the way for adding a new system.large_collections table (See #11449): Fixes #11620 Fixes #11621 Fixes #11622 the respective fixes should be backported to different release branches, based on the respective patches they depend on (mentioned in each issue). Closes #11623 * github.com:scylladb/scylladb: docs: adjust to sstable base name docs: large-partition-table: adjust for additional rows column docs: debugging-large-partition: update log warning example db/large_data_handler: print static cell/collection description in log warning db/large_data_handler: separate pk and ck strings in log warning with delimiter	2022-09-28 17:52:23 +03:00
Nadav Har'El	de1bc147bc	Merge 'test.py: cleanups in topology test suites' from Kamil Braun Fix the type of `create_server`, rename `topology_for_class` to `get_cluster_factory`, simplify the suite definitions and parameters passed to `get_cluster_factory` Closes #11590 * github.com:scylladb/scylladb: test.py: replace `topology` with `cluster_size` in Topology tests test.py: rename `topology_for_class` to `get_cluster_factory` test/pylib: ScyllaCluster: fix create_server parameter type	2022-09-28 15:19:54 +03:00
Kamil Braun	1bcc28b48b	test/topology_raft_disabled: reenable `test_raft_upgrade` The test was disabled due to a bug in the Python driver which caused the driver not to reconnect after a node was restarted (see scylladb/python-driver#170). Introduce a workaround for that bug: we simply create a new driver session after restarting the nodes. Reenable the test. Closes #11641	2022-09-28 15:13:42 +03:00
Mikołaj Grzebieluch	be8fcba8c1	raft: broadcast_tables: add support for bind variables Extended the queries language to support bind variables which are bound in the execution stage, before creating a raft command. Adjusted `test_broadcast_tables.py` to prepare statements at the beginning of the test. Fixed a small bug in `strongly_consistent_modification_statement::check_access`. Closes #11525	2022-09-28 09:54:59 +03:00
Alejo Sanchez	02933c9b82	test.py: close aiohttp session for topology tests Close the aiohttp ClientSession after pytest session finishes. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #11648	2022-09-27 18:09:08 +02:00
Kamil Braun	82481ae31b	Merge 'raft server, log size limit in bytes' from Gusev Petr Before this patch we could get an OOM if we received several big commands. The number of commands was small, but their total size in bytes was large. snapshot_trailing_size is needed to guarantee progress. Without this limit the fsm could get stuck if the size of the next item is greater than max_log_size - (size of trailing entries). Closes #11397 * github.com:scylladb/scylladb: raft replication_test, make backpressure test to do actual backpressure raft server, shrink_to_fit on log truncation raft server, release memory if add_entry throws raft server, log size limit in bytes	2022-09-27 14:25:08 +02:00
Kamil Braun	ed67f0e267	Merge 'test.py: fix topology init error handling' from Alecco When there are errors starting the first cluster(s) the logs of the server logs are needed. So move `.start()` to the `try` block in `test.py` (out of `asynccontextmanager`). While there, make `ScyllaClusterManager.start()` idempotent. Closes #11594 * github.com:scylladb/scylladb: test.py: fix ScyllaClusterManager start/stop test.py: fix topology init error handling	2022-09-27 11:36:07 +02:00
Petr Gusev	bc50b7407f	raft replication_test, make backpressure test to do actual backpressure Before this patch this test didn't actually experience any backpressure since all the commands were executed sequentially.	2022-09-27 12:04:14 +04:00
Petr Gusev	cbfe033786	raft server, shrink_to_fit on log truncation We don't want to keep memory we don't use, shrink_to_fit guarantees that. In fact, boost::deque frees up memory when items are deleted, so this change has little effect at the moment, but it may pay off if we change the container in the future.	2022-09-27 12:02:36 +04:00
Petr Gusev	b34dfed307	raft server, release memory if add_entry throws We consume memory from semaphore in add_entry_on_leader, but never release it if add_entry throws.	2022-09-27 12:02:34 +04:00
Benny Halevy	b178813cba	docs: cql-extensions: add TRUNCATE to USING TIMEOUT section. List the queries that support the TIMEOUT parameter. Mention the newly added support for TRUNCATE USING TIMEOUT. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 18:30:39 +03:00
Benny Halevy	b0bad0b153	docs: cql: ddl: add support for TRUNCATE USING TIMEOUT Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 18:30:39 +03:00
Benny Halevy	64140ccf05	cql3, storage_proxy: add support for TRUNCATE USING TIMEOUT Extend the cql3 truncate statement to accept attributes, similar to modification statements. To achieve that we define cql3::statements::raw::truncate_statement derived from raw::cf_statement, and implement its pure virtual prepare() method to make a prepared truncate_statement. The latter, statements::truncate_statement, is no longer derived from raw::cf_statement, and just stores a schema_ptr to get to the keyspace and column_family names. `test_truncate_using_timeout` cql-pytest was added to test the new USING TIMEOUT feature. Fixes #11408 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 18:30:39 +03:00
Benny Halevy	27d3e48005	cql3: selectStatement: restrict to USING TIMEOUT in grammar It is preferred to reject USING TLL / TIMESTAMP at the grammar level rather than functionally validating the USING attributes. test_using_timeout was adjusted respectively to expect the `SyntaxException` error rather than `InvalidRequest`. Note that cql3::statements::raw::select_statement validate_attrs now asserts that the ttl or the timestamp attributes aren't set. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 18:30:39 +03:00
Benny Halevy	0728d33d5f	cql3: deleteStatement: restrict to USING TIMEOUT\|TIMESTAMP in grammar It is preferred to reject USING TLL / TIMESTAMP at the grammar level rather than functionally validating the USING attributes. test_using_timeout was adjusted respectively to expect the `SyntaxException` error rather than `InvalidRequest`. Note that now delete_statement ctor asserts that the ttl attribute is not set. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 18:30:39 +03:00
Kamil Braun	696bdb2de7	test.py: replace `topology` with `cluster_size` in Topology tests First, a reminder of a few basic concepts in Scylla: - "topology" is a mapping: for each node, its DC and Rack. - "replication strategy" is a method of calculating replica sets in a cluster. It is not a cluster-global property; each keyspace can have a different replication strategy. A cluster may have multiple keyspaces. - "cluster size" is the number of nodes in a cluster. Replication strategy is orthogonal to topology. Cluster size can be derived from topology and is also orthogonal to replication strategy. test.py was confusing the three concepts together. For some reason, Topology suites were specifying a "topology" parameter which contained replication strategy details - having nothing to do with topology. Also it's unclear why a test suite would specify anything to do with replication strategies - after all, a test may create keyspaces with different replication strategies, and a suite may contain multiple different tests. Get rid of the "topology" parameter, replace it with a simple "cluster_size". In the future we may re-introduce it when we actually implement the possibility to start clusters with custom topologies (which involves configuring the snitch etc.) Simplify the test.py code.	2022-09-26 15:17:50 +02:00
Botond Dénes	895522db23	mutation_fragment_stream_validator: make interface more robust The validator has several API families with increasing amount of detail. E.g. there is an `operator()(mutation_fragment_v2::kind)` and an overload also taking a position. These different API families currently cannot be mixed. If one uses one overload-set, one has to stick with it, not doing so will generate false-positive failures. This is hard to explain in documentation to users (provided they even read it). Instead, just make the validator robust enough such that the different API subsets can be mixed in any order. The validator will try to make most of the situation and validate as much as possible. Behind the scenes all the different validation methods are consolidated into just two: one for the partition level, the other for the intra-partition level. All the different overloads just call these methods passing as much information as they have. A test is also added to make sure this works.	2022-09-26 13:26:26 +03:00
Kamil Braun	0725ab3a3e	test.py: rename `topology_for_class` to `get_cluster_factory` The previous name had nothing to do with what the function calculated and returned (it returned a `create_cluster` function; the standard name for a function that constructs objects would be 'factory', so `get_cluster_factory` is an appropriate name for a function that returns cluster factories).	2022-09-26 11:45:44 +02:00
Kamil Braun	06cc4f9259	test/pylib: ScyllaCluster: fix create_server parameter type The only usage of `ScyllaCluster` constructor passed a `create_server` function which expected a `List[str]` for the second parameter, while the constructor specified that the function should expect an `Optional[List[str]]`. There was no reason for the latter, we can easily fix this type error. Also give a type hint for `create_cluster` function in `PythonTestSuite.topology_for_class`. This is actually what catched the type error.	2022-09-26 11:45:44 +02:00
Petr Gusev	27e60ecbf4	raft server, log size limit in bytes Before this patch we could get an OOM if we received several big commands. The number of commands was small, but their total size in bytes was large. snapshot_trailing_size is needed to guarantee progress. Without this limit the fsm could get stuck if the size of the next item is greater than max_log_size - (size of trailing entries).	2022-09-26 13:10:10 +04:00
Benny Halevy	d32c497cd9	database: automatically take snapshot of base table views The logic to reject explicit snapshot of views/indexes was improved in `aa127a2dbb`. However, we never implemented auto-snapshot of view/indexes when taking a snapshot of the base table. This is implemented in this patch. The implementation is built on top of `ba42852b0e` so it would be hard to backport to 5.1 or earlier releases. Fixes #11612 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 11:02:54 +03:00
Benny Halevy	55b0b8fe2c	api: storage_service: reject snapshot of views in api layer Rather than pushing the check to `snapshot_ctl::take_column_family_snapshot`, just check that explcitly when taking a snapshot of a particular table by name over the api. Other paths that call snapshot_ctl::take_column_family_snapshot are internal and use it to snap views already. With that, we can get rid of the allow_view_snapshots flag that was introduced in `aab4cd850c`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 10:44:56 +03:00
Botond Dénes	4d017b6d7e	mutation_fragment_stream_validator: add reset() to validating filter Allow the high level filtering validator to be reset() to a certain position, so it can be used in situations where the consumption is not continuous (fast-forwarding or paging).	2022-09-26 10:17:28 +03:00
Botond Dénes	a8cbf66573	mutation_fragment_stream_validator: move active tomsbtone validation into low level validator Currently the active range tombstone change is validated in the high level `mutation_fragment_stream_validating_stream`, meaning that users of the low-level `mutation_fragment_stream_validator` don't benefit from checking that tombstones are properly closed. This patch moves the validation down to the low-level validator (which is what the high-level one uses under the hood too), and requires all users to pass information about changes to the active tombstone for each fragment.	2022-09-26 10:17:27 +03:00
Nadav Har'El	868a884b79	test/cql-pytest: add reproducer for ignored IS NOT NULL This test reproduces issue #10365: It shows that although "IS NOT NULL" is not allowed in regular SELECT filters, in a materialized view it is allowed, even for non-key columns - but then outright ignored and does not actually filter out anything - a fact which already surprised several users. The test also fails on Cassandra - it also wrongly allows IS NOT NULL on the non-key columns but then ignores this in the filter. So the test is marked with both xfail (known to fail on Scylla) and cassandra_bug (fails on Cassandra because of what we consider to be a Cassandra bug). Refs #10365 Refs #11606 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11615	2022-09-26 09:02:08 +03:00
Anna Stuchlik	c5285bcb14	doc: remove the section about updating OS packages during upgrade from upgrade guides for Ubunut and Debian (from 4.5 to 4.6) Closes #11629	2022-09-26 08:04:02 +03:00
Avi Kivity	ad2f1dc704	Merge 'Avoid default initialization of token_metadata and topology when not needed' from Pavel Emelyanov The goal is not to default initialize an object when its fields are about to be immediately overwritten by the consecutive code. Closes #11619 * github.com:scylladb/scylladb: replication_strategy: Construct temp tokens in place topology: Define copy-sonctructor with init-lists	2022-09-25 18:08:42 +03:00
Jan Ciolek	ac152af88c	expression: Add for_each_boolean factor boolean_factors is a function that takes an expression and extracts all children of the top level conjunction. The problem is that it returns a vector<expression>, which is inefficent. Sometimes we would like to iterate over all boolean factors without allocations. for_each_boolean_factor is implemented for this purpose. boolean_factors() can be implemented using for_each_boolean_factor, so it's done to reduce code duplication. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-09-25 16:34:22 +03:00
Avi Kivity	2a538f5543	Merge 'Cut one of snitch->gossiper links' from Pavel Emelyanov Snitch uses gossiper for several reasons, one of is to re-gossip the topology-related app states when property-file snitch config changes. This set cuts this link by moving re-gossiping into the existing storage_service::snitch_reconfigured() subscription. Since initial snitch state gossiping happens in storage service as well, this change is not unreasonable. Closes #11630 * github.com:scylladb/scylladb: storage_service: Re-gossiping snitch data in reconfiguration callback storage_service: Coroutinize snitch_reconfigured() storage_service: Indentation fix after previous patch storage_service: Reshard to shard-0 earlier storage_service: Refactor snitch reconfigured kick	2022-09-25 16:08:48 +03:00
Benny Halevy	a1adbf1f59	docs: adjust to sstable base name Since `244df07771` (scylla 5.1), only the sstable basename is kept in the large_* system tables. The base path can be determined from the keyspace and table name. Fixes #11621 Adjust the examples in documentation respectively. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-25 14:38:13 +03:00
Benny Halevy	33924201cc	docs: large-partition-table: adjust for additional rows column Since `a7511cf600` (scylla 5.0), sstables containing partitions with too many rows are recorded in system.large_partitions. Adjust the doc respectively. Fixes #11622 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-25 14:38:13 +03:00
Benny Halevy	92ff17c6e3	docs: debugging-large-partition: update log warning example The log warning format has changed since `f3089bf3d1` and was fixed in the previous patch to include a delimiter between the partition key, clustering key, and column name. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-25 14:38:13 +03:00
Benny Halevy	fcbbc3eb9c	db/large_data_handler: print static cell/collection description in log warning When warning about a large cell/collection in a static row, print that fact in the log warning to make it clearer. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-25 14:37:42 +03:00
Benny Halevy	4670829502	db/large_data_handler: separate pk and ck strings in log warning with delimiter Currently (since `f3089bf3d1`), when printing a warning to the log about large rows and/or cells the clustering key string is concatenated to the partition key string, rendering the warning confsing and much less useful. This patch adds a '/' delimiter to separate the fields, and also uses one to separate the clustering key from the column name for large cells. In case of a static cell, the clustering key is null hence the warning will look like: `pk//column`. This patch does NOT change anything in the large_* system table schema or contents. It changes only the log warning format that need not be backward compatible. Fixes #11620 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-25 14:36:41 +03:00
Pavel Emelyanov	47958a4b37	storage_service: Re-gossiping snitch data in reconfiguration callback Nowadays it's done inside snitch, and snitch needs to carry gossiper refernece for that. There's an ongoing effort in de-globalizing snitch and fixing its dependencies. This patch cuts this snitch->gossiper link to facilitate the mentioned effort. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-23 14:31:55 +03:00
Pavel Emelyanov	932566d448	storage_service: Coroutinize snitch_reconfigured() Next patch will add more sleeping code to it and it's simpler if the new call is co_await-ed rather than .then()-ed Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-23 14:31:55 +03:00
Pavel Emelyanov	7fee98cad0	storage_service: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-23 14:31:55 +03:00
Pavel Emelyanov	3d4ea2c628	storage_service: Reshard to shard-0 earlier It makes next patch shorter and nicer Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-23 14:31:55 +03:00
Pavel Emelyanov	11b79f9f80	storage_service: Refactor snitch reconfigured kick The snitch_reconfigured calls update_topology with local node bcast address argument. Things get simpler if the callee gets the address itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-23 14:31:43 +03:00
Anna Stuchlik	46f0e99884	doc: add the link to the new Troubleshooting section and replace Scylla with ScyllaDB	2022-09-23 11:46:15 +02:00
Anna Stuchlik	af2a85b191	doc: add the new page to the toctree	2022-09-23 11:37:38 +02:00
Anna Stuchlik	b034e2856e	doc: add a troubleshooting article about the missing configuration files	2022-09-23 11:17:18 +02:00
Botond Dénes	b9d55ee02f	Merge 'Add cassandra functional - show warn/err when tombstone threshold reached.' from Taras Borodin Add cassandra functional - show warn/err when tombstone_warn_threshold/tombstone_failure_threshold reached on select, by partitions. Propagate raw query_string from coordinator to replicas. Closes #11356 * github.com:scylladb/scylladb: add utf8:validate to operator<< partition_key with_schema. Show warn message if `tombstone_warn_threshold` reached on querier.	2022-09-23 05:53:47 +03:00
Pavel Emelyanov	9e7407ff91	replication_strategy: Construct temp tokens in place Otherwise, the token_metadata object is default-initialized, then it's move-assigned from another object. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-22 19:19:32 +03:00
Pavel Emelyanov	d540af2cb0	topology: Define copy-sonctructor with init-lists Otherwise the topology is default-constructed, then its fields are copy-assigned with the data from the copy-from reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-22 19:18:58 +03:00
Tomasz Grabiec	ccbfe2ef0d	Merge 'Fix invalid mutation fragment stream issues' from Botond Dénes Found by a fragment stream validator added to the mutation-compactor (https://github.com/scylladb/scylladb/pull/11532). As that PR moves very slowly, the fixes for the issues found are split out into a PR of their own. The first two of these issues seems benign, but it is important to remember that how benign an invalid fragment stream is depends entirely on the consumer of said stream. The present consumer of said streams may swallow the invalid stream without problem now but any future change may cause it to enter into a corrupt state. The last one is a non-benign problem (again because the consumer reacts badly already) causing problems when building query results for range scans. Closes #11604 * github.com:scylladb/scylladb: shard_reader: do_fill_buffer(): only update _end_of_stream after buffer is copied readers/mutation_readers: compacting_reader: remember injected partition-end db/view: view_builder::execute(): only inject partition-start if needed	2022-09-22 17:57:27 +02:00
Taras Borodin	c155ae1182	add utf8:validate to operator<< partition_key with_schema.	2022-09-22 16:42:31 +03:00
TarasBor	1f4a93da78	Show warn message if `tombstone_warn_threshold` reached on querier. When querier read page with tombstones more than `tombstone_warn_threshold` limit - warning message appeared in logs. If `tombstone_warn_threshold:0` feature disabled. Refs scylladb#11410	2022-09-22 16:42:31 +03:00
Avi Kivity	0cbaef31c1	dirty_memory_manager: fold do_update() into region_group::update() There is just one caller, and folding the two functions enables simplification.	2022-09-22 15:51:19 +03:00
Avi Kivity	8672f2248c	dirty_memory_manager: simplify memory_hard_limit's do_update do_update() has an output parameter (top_relief) which can either be set to an input parameter or left alone. Simplify it by returning bool and letting the caller reuse the parameter's value instead.	2022-09-22 15:50:48 +03:00
Avi Kivity	1858268377	dirty_memory_manager: drop soft limit / soft pressure members in memory_hard_limit They are write-only. This corresponds to the fact that memory_hard_limit does not do flushing (which is initiated by crossing the soft limit), it only blocks new allocations.	2022-09-22 14:59:38 +03:00
Asias He	9ed401c4b2	streaming: Add finished percentage metrics for node ops using streaming We have added the finished percentage for repair based node operations. This patch adds the finished percentage for node ops using the old streaming. Example output: scylla_streaming_finished_percentage{ops="bootstrap",shard="0"} 1.000000 scylla_streaming_finished_percentage{ops="decommission",shard="0"} 1.000000 scylla_streaming_finished_percentage{ops="rebuild",shard="0"} 0.561945 scylla_streaming_finished_percentage{ops="removenode",shard="0"} 1.000000 scylla_streaming_finished_percentage{ops="repair",shard="0"} 1.000000 scylla_streaming_finished_percentage{ops="replace",shard="0"} 1.000000 In addition to the metrics, log shows the percentage is added. [shard 0] range_streamer - Finished 2698 out of 2817 ranges for rebuild, finished percentage=0.95775646 Fixes #11600 Closes #11601	2022-09-22 14:19:34 +03:00
Avi Kivity	8369741063	dirty_memory_manager: de-template do_update(region_group_or_memory_hard_limit) We made this function a template to prevent code duplication, but now memory_hard_limit was sufficiently simplified so that the implementations can start to diverge.	2022-09-22 14:16:43 +03:00
Avi Kivity	76ced5a60c	dirty_memory_manager: adjust soft_limit threshold check Use `>` rather than `>=` to match the hard limit check. This will aid simplification, since for memory_hard_limit the soft and hard limits are identical. This should not cause any material behavior change, we're not sensitive to single byte accounting. Typical limits are on the order of gigabytes.	2022-09-22 14:06:01 +03:00
Avi Kivity	b9eb26cd77	dirty_memory_manager: drop memory_hard_limit::_name It's write-only.	2022-09-22 14:01:57 +03:00
Avi Kivity	c64fb66cc3	dirty_memory_manager: simplify memory_hard_limit configuration We observe that memory_hard_limit's reclaim_config is only ever initialized as default, or with just the hard_limit parameter. Since soft_limit defaults to hard_limit, we can collapse the two into a limit. The reclaim callbacks are always left as the default no-op functions, so we can eliminate them too. This fits with memory_hard_limit only being responsible for the hard limit, and for it not having any memtables to reclaim on its own.	2022-09-22 13:56:59 +03:00
Avi Kivity	2f907dc47d	dirty_memory_manager: fold region_group_reclaimer into {memory_hard_limit,region_group} region_group_reclaimer is used to initialize (by reference) instances of memory_hard_limit and region_group. Now that it is a final class, we can fold it into its users by pasting its contents into those users, and using the initializer (reclaim_config) to initialize the users. Note there is a 1:1 relationship between a region_group_reclaimer instance and a {memory_hard_limit,region_group} instance. It may seem like code duplication to paste the contents of one class into two, but the two classes use region_group_reclaimer differently, and most of the code is just used to glue different classes together, so the next patches will be able to get rid of much of it. Some notes: - no_reclaimer was replaced by a default reclaim_config, as that's how no_reclaimer was initialized - all members were added as private, except when a caller required one to be public - an under_presssure() member already existed, forwarding to the reclaimer; this was just removed.	2022-09-22 13:56:59 +03:00
Avi Kivity	d8f857e74b	dirty_memory_manager: stop inheriting from region_group_reclaimer This inheritance makes it harder to get rid of the class. Since there are no longer any virtual functions in the class (apart from the destructor), we can just convert it to a data member. In a few places, we need forwarding functions to make formerly-inherited functions visible to outside callers. The virtual destructor is removed and the class is marked final to verify it is no longer a base class anywhere.	2022-09-22 13:56:59 +03:00
Avi Kivity	26f3a123a5	dirty_memory_manager: test: unwrap region_group_reclaimer In one test, region_group_reclaimer is wrapped in another class just to toggle a bool, but with the new callbacks it's easy to just use a bool instead.	2022-09-22 13:56:59 +03:00
Avi Kivity	1d3508e02c	dirty_memory_manager: change region_group_reclaimer configuration to a struct It's just so much nicer. The "threshold" limit was renamed to "hard_limit" to contrast it with "soft_limit" (in fact threshold is a good name for soft_limit, since it's a point where the behavior begins to change, but that's too much of a change).	2022-09-22 13:56:59 +03:00
Avi Kivity	2c54c7d51e	dirty_memory_manager: convert region_group_reclaimer to callbacks region_group_reclaimer is partially policy (deciding when to reclaim) and partially mechanism (implementing reclaim via virtual functions). Move the mechanism to callbacks. This will make it easy to fold the policy part into region_group and memory_hard_limit. This folding is expected to simplify things since most of region_group_reclaimer is cross-class communication.	2022-09-22 13:56:59 +03:00
Avi Kivity	8fa0652e68	dirty_memory_manager: consolidate region_group_reclaimer constructors Delegate to other constructors rather than repeating the code. Doesn't help much here, but simplifies the next patch.	2022-09-22 13:56:59 +03:00
Avi Kivity	5efbfa4cab	dirty_memory_manager: rename {memory_hard_limit,region_group}::notify_relief It clashes with region_group_reclaimer::notify_relief, which does something different. Since we plan to merge region_group_reclaimer into memory_hard_limit and region_group (this can simplify the code), we need to avoid duplicate function names.	2022-09-22 13:56:59 +03:00
Avi Kivity	a72ac14154	dirty_memory_manager: drop unused parameter to memory_hard_limit constructor	2022-09-22 13:56:59 +03:00
Avi Kivity	fca5689052	dirty_memory_manager: drop memory_hard_limit::shutdown() It is empty.	2022-09-22 13:56:59 +03:00
Avi Kivity	152136630c	dirty_memory_manager: split region_group hierarchy into separate classes Currently, region_group forms a hierarchy. Originally it was a tree, but previous work whittled it down to a parent-child relationship (with a single, possible optional parent, and a single child). The actual behavior of the parent and child are very different, so it makes sense to split them. The main difference is that the parent does not contain any regions (memtables), but the child does. This patch mechanically splits the class. The parent is named memory_hard_limit (reflecting its role to prevent lsa allocation above the memtable configured hard limit). The child is still named region_group. Details of the transformation: - each function or data member in region_group is either moved to memory_hard_limit, duplicated in memory_hard_limit, or left in region_group. - the _regions and _blocked_requests members, which were always empty in the parent, were not duplicated. Any member that only accessed them was similarly left alone. - the "no_reclaimer" static member which was only used in the parent was moved there. Similarly the constructor which accepted it was moved. - _child was moved to the parent, and _parent was kept in the child (more or less the defining change of the split) Similarly add(region_group) and del(region_group) (which manage _child) were moved. - do_for_each_parent(), which iterated to the top of the tree, was removed and its callers manually unroll the loop. For the parent, this is just a single iteration (since we're iterating towards the root), for the child, this can be two iterations, but the second one is usually simpler since the parent has many members removed. - do_update(), introduced in the previous patch, was made a template that can act on either the parent or the child. It will be further simplified later. - some tests that check now-impossible topologies were removed. - the parent's shutdown() is trivial since it has no _blocked_requests, but it was kept to reduce churn in the callers.	2022-09-22 13:56:59 +03:00
Avi Kivity	009bd63217	dirty_memory_manager: extract code block from region_group::update A mechanical transformation intended to allow reuse later. The function doesn't really deserve to exist on its own, so it will be swallowed back by its callers later.	2022-09-22 13:56:59 +03:00
Avi Kivity	34d5322368	dirty_memory_manager: move more allocation_queue functions out of region_group More mechanical changes, reducing churn for later patches.	2022-09-22 13:56:59 +03:00
Avi Kivity	4bc2638cf9	dirty_memory_manager: move some allocation queue related function definitions outside class scope It's easier to move them to a new owner (allocation_queue) if they are not defined in the class.	2022-09-22 13:56:59 +03:00
Avi Kivity	71493c2539	dirty_memory_manager: move region_group::allocating_function and related classes to new class allocation_queue region_group currently fulfills two roles: in one role, when instantiated as dirty_memory_manager::_virtual_region_group, it is responsible for holding functions that allocate memtable memory (writes) and only allowing them to run when enough dirty memory has been flushed from other memtables. The other role, when instantiated as dirty_memory_manager::_real_region_group, is to provide a hard stop when the total amount of dirty memory exceeds the limit, since the other limit is only estimated. We want to simplify the whole thing, which means not using the same class for two different roles (or rather, we can use it for both roles if we simplify the internals significantly). As a first step towards clarifying what functionality is used in what role, move some classes related to holding allocating functions to a new class allocation_queue. We will gradually move move content there, reducing the amount of role confusion in region_group. Type aliases are added to reduce churn.	2022-09-22 13:56:59 +03:00
Avi Kivity	d21d2cdb3e	dirty_memory_manager: remove support for multiple subgroups We only have one parent/child relationship in the region group hierarchy, so support for more is unneeded complexity. Replace the subgroup vector with a single pointer, and delete a test for the removed functionality.	2022-09-22 13:56:59 +03:00
Botond Dénes	0ccb23d02b	shard_reader: do_fill_buffer(): only update _end_of_stream after buffer is copied Commit `8ab57aa` added a yield to the buffer-copy loop, which means that the copy can yield before done and the multishard reader might see the half-copied buffer and consider the reader done (because `_end_of_stream` is already set) resulting in the dropping the remaining part of the buffer and in an invalid stream if the last copied fragment wasn't a partition-end. Fixes: #11561	2022-09-22 13:54:36 +03:00
Botond Dénes	16a0025dc3	readers/mutation_readers: compacting_reader: remember injected partition-end Currently injecting a partition-end doesn't update `_last_uncompacted_kind`, which will allow for a subsequent `next_partition()` call to trigger injecting a partition-end, leading to an invalid mutation fragment stream (partition-end after partition-end). Fix by changing `_last_uncompacted_kind` to `partition_end` when injecting a partition-end, making subsequent injection attempts noop. Fixes: #11608	2022-09-22 13:54:36 +03:00
Botond Dénes	681e6ae77f	db/view: view_builder::execute(): only inject partition-start if needed When resuming a build-step, the view builder injects the partition-start fragment of the last processed partition, to bring the consumer (compactor) into the correct state before it starts to consume the remainder of the partition content. This results in an invalid fragment stream when the partition was actually over or there is nothing left for the build step. Make the inject conditional on when the reader contains more data for the partition. Fixes: #11607	2022-09-22 13:54:36 +03:00
Nadav Har'El	517c1529aa	docs: update docs/alternator/getting-started.md Update several aspects of the alternator/getting-started.md which were not up-to-date: * When the documented was written, Alternator was moving quickly so we recommended running a nightly version. This is no longer the case, so we should recommend running the latest stable build. * The link to the download link is no longer helpful for getting Docker instructions (it shows some generic download options). Instead point to our dockerhub page. * Replace mentions of "Scylla" by the new official name, "ScyllaDB". * Miscelleneous copy-edits. Fixes #11218 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11605	2022-09-22 11:08:05 +03:00
Piotr Sarna	481240b8b4	Merge 'Alternator: Run more TTL tests by default (and add a test for metrics)' from Nadav Har'El We had quite a few tests for Alternator TTL in test/alternator, but most of them did not run as part of the usual Jenkins test suite, because they were considered "very slow" (and require a special "--runveryslow" flag to run). In this series we enable six tests which run quickly enough to run by default, without an additional flag. We also make them even quicker - the six tests now take around 2.5 seconds. I also noticed that we don't have a test for the Alternator TTL metrics - and added one. Fixes #11374. Refs https://github.com/scylladb/scylla-monitoring/issues/1783 Closes #11384 * github.com:scylladb/scylladb: test/alternator: insert test names into Scylla logs rest api: add a new /system/log operation alternator ttl: log warning if scan took too long. alternator,ttl: allow sub-second TTL scanning period, for tests test/alternator: skip fewer Alternator TTL tests test/alternator: test Alternator TTL metrics	2022-09-22 09:47:50 +02:00
Botond Dénes	ef7471c460	readers/mutation_reader: stream validator: fix log level detection logic The mutation fragment stream validator filter has a detailed debug log in its constructor. To avoid putting together this message when the log level is above debug, it is enclosed in an if, activated when log level is debug or trace... at least that was intended. Actually the if is activated when the log level is debug or above (info, warn or error) but is only actually logged if the log level is exactly debug. Fix the logic to work as intended. Closes #11603	2022-09-22 09:41:45 +03:00
Pavel Emelyanov	7ae73c665b	gossiper: Remove some dead code Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11599	2022-09-22 06:58:29 +03:00
Pavel Emelyanov	5edeecf39b	token_metadata: Provide dc/rack for bootstrapping nodes The token_metadata::calculate_pending_ranges_for_bootstrap() makes a clone of itself and adds bootstrapping nodes to the clone to calculate ranges. Currently added nodes lack the dc/rack which affects the calculations the bad way. Unfortunately, the dc/rack for those nodes is not available on topology (yet) and needs pretty heavy patching to have. Fortunately, the only caller of this method has gossiper at hand to provide the dc/rack from. fixes: #11531 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11596	2022-09-22 06:55:52 +03:00
Petr Gusev	210d9dd026	raft: fix snapshots leak applier_fiber could create multiple snapshots between io_fiber run. The fsm_output.snp variable was overwritten by applier_fiber and io_fiber didn't drop the previous snapshot. In this patch we introduce the variable fsm_output.snps_to_drop, store in it the current snapshot id before applying a new one, and then sequentially drop them in io_fiber after storing the last snapshot_descriptor. _sm_events.signal() is added to fsm::apply_snapshot, since this method mutates the _output and thus gives a reason to run io_fiber. The new test test_frequent_snapshotting demonstrates the problem by causing frequent snapshots and setting the applier queue size to one. Closes #11530	2022-09-21 12:46:26 +02:00
Kamil Braun	3b096b71c1	test/topology_raft_disabled: disable `test_raft_upgrade` For some reason, the test is currently flaky on Jenkins. Apparently the Python driver does not reconnect to the cluster after the cluster restarts (well it does, but then it disconnects from one of the nodes and never reconnects again). This causes the test to hang on "waiting until driver reconnects to every server" until it times out. Disable it for now so it doesn't block next promotion.	2022-09-21 12:32:40 +02:00
Nadav Har'El	22bb35e2cb	Merge 'doc: update the "Counting all rows in a table is slow" page' from Anna Stuchlik Fix https://github.com/scylladb/scylladb/issues/11373 - Updated the information on the "Counting all rows in a table is slow" page. - Added COUNT to the list of selectors of the SELECT statement (somehow it was missing). - Added the note to the description of the COUNT() function with a link to the KB page for troubleshooting if necessary. This will allow the users to easily find the KB page. Closes #11417 * github.com:scylladb/scylladb: doc: add a comment to remove the note in version 5.1 doc: update the information on the Countng all rows page and add the recommendation to upgrade ScyllaDB doc: add a note to the description of COUNT with a reference to the KB article doc: add COUNT to the list of acceptable selectors of the SELECT statement	2022-09-21 12:32:40 +02:00
Alejo Sanchez	510215d79a	test.py: fix ScyllaClusterManager start/stop Check existing is_running member to avoid re-starting. While there, set it to false after stopping. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-09-21 11:42:02 +02:00
Alejo Sanchez	933d93d052	test.py: fix topology init error handling Start ScyllaClusterManager within error handling so the ScyllaCluster logs are available in case of error starting up. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-09-21 09:15:25 +02:00
Nadav Har'El	91bccee9be	Update tools/java submodule * tools/java b004da9d1b...5f2b91d774 (1): > install.sh is using wrong permissions for install cqlsh files Fixes #11584	2022-09-20 14:42:34 +03:00
Avi Kivity	2cec417426	Merge 'tools: use the standard allocator' from Botond Dénes Tools want to be as little disrupting to the environment they run in as possible, because they might be run in a production environment, next to a running scylladb production server. As such, the usual behavior of seastar applications w.r.t. memory is an anti-pattern for tools: they don't want to reserve most of the system memory, in fact they don't want to reserve any amount, instead consuming as much as needed on-demand. To achieve this, tools want to use the standard allocator. To achieve this they need a seastar option to to instruct seastar to not configure and use the seastar allocator and they need LSA to cooperate with the standard allocator. The former is provided by https://github.com/scylladb/seastar/pull/1211. The latter is solved by introducing the concept of a `segment_store_backend`, which abstracts away how the memory arena for segments is acquired and managed. We then refactor the existing segment store so that the seastar allocator specific parts are moved to an implementation of this backend concept, then we introduce another backend implementation appropriate to the standard allocator. Finally, tools configure seastar with the newly introduced option to use the standard allocator and similarly configure LSA to use the standard allocator appropriate backend. Refs: https://github.com/scylladb/scylladb/issues/9882 This is the last major code piece in scylla for making tools production ready. Closes #11510 * github.com:scylladb/scylladb: test/boost: add alternative variant of logalloc test tools: use standard allocator utils/logalloc: add use_standard_allocator_segment_pool_backend() utils/logalloc: introduce segment store backend for standard allocator utils/logalloc: rebase release segment-store on segment-store-backend utils/logalloc: introduce segment_store_backend utils/logalloc: push segment alloc/dealloc to segment_store test/boost/logalloc_test: make test_compaction_with_multiple_regions exception-safe	2022-09-20 12:59:34 +03:00
Nadav Har'El	4a453c411d	Merge 'doc: add the upgrade guide from 5.0 to 5.1' from Anna Stuchlik Fix https://github.com/scylladb/scylladb/issues/11376 This PR adds the upgrade guide from version 5.0 to 5.1. It involves adding new files (5.0-to-5.1) and language/formatting improvements to the existing content (shared by several upgrade guides). Closes #11577 * github.com:scylladb/scylladb: doc: upgrade the command to upgrade the ScyllaDB image from 5.0 to 5.1 doc: add the guide to upgrade ScyllaDB from 5.0 to 5.1	2022-09-20 11:52:59 +03:00
Nadav Har'El	d81bedd3be	Merge 'doc: add ScyllaDB image upgrade guides for patch releases' from Anna Stuchlik This PR adds the missing upgrade guides for upgrading the ScyllaDB image to a patch release: - ScyllaDB 5.0: /upgrade/upgrade-opensource/upgrade-guide-from-5.x.y-to-5.x.z/upgrade-guide-from-5.x.y-to-5.x.z-image/ - ScyllaDB Enterprise: /upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/upgrade-guide-from-2022.1-to-2022.1-image/ (the file name is wrong and will be fixed with another PR) In addition, the section regarding the recommended upgrade procedure has been improved. Fixes https://github.com/scylladb/scylladb/issues/11450 Fixes https://github.com/scylladb/scylladb/issues/11452 Closes #11460 * github.com:scylladb/scylladb: doc: update the commands to upgrade the ScyllaDB image doc: fix the filename in the index to resolve the warnings and fix the link doc: apply feedback by adding she step fo load the new repo and fixing the links doc: fix the version name in file upgrade-guide-from-2021.1-to-2022.1-image.rst doc: rename the upgrade-image file to upgrade-image-opensource and update all the links to that file doc: update the Enterprise guide to include the Enterprise-onlyimage file doc: update the image files doc: split the upgrade-image file to separate files for Open Source and Enterprise doc: clarify the alternative upgrade procedures for the ScyllaDB image doc: add the upgrade guide for ScyllaDB Image from 2022.x.y. to 2022.x.z doc: add the upgrade guide for ScyllaDB Image from 5.x.y. to 5.x.z	2022-09-20 11:51:26 +03:00
Botond Dénes	4ef7b080e3	docs/using-scylla/migrate-scylla.rst: remove link to unirestore It points to a private scylladb repo, which has no place in user-facing documentation. For now there is no public replacement, but a similar functionality is in the works for Scylla Manager. Fixes: #11573 Closes #11580	2022-09-20 11:46:28 +03:00
Anna Stuchlik	7b2209f291	doc: upgrade the command to upgrade the ScyllaDB image from 5.0 to 5.1	2022-09-20 10:42:47 +02:00
Anna Stuchlik	db75adaf9a	doc: update the commands to upgrade the ScyllaDB image	2022-09-20 10:36:18 +02:00
Nadav Har'El	4c93a694b7	cql: validate bloom_filter_fp_chance up-front Scylla's Bloom filter implementation has a minimal false-positive rate that it can support (6.71e-5). When setting bloom_filter_fp_chance any lower than that, the compute_bloom_spec() function, which writes the bloom filter, throws an exception. However, this is too late - it only happens while flushing the memtable to disk, and a failure at that point causes Scylla to crash. Instead, we should refuse the table creation with the unsupported bloom_filter_fp_chance. This is also what Cassandra did six years ago - see CASSANDRA-11920. This patch also includes a regression test, which crashes Scylla before this patch but passes after the patch (and also passes on Cassandra). Fixes #11524. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11576	2022-09-20 06:18:51 +03:00
Botond Dénes	60991358e8	Merge 'Improvements to test/lib/sstable_utils.hh' from Raphael "Raph" Carvalho Changes done to avoid pitfalls and fix issues of sstable-related unit tests Closes #11578 * github.com:scylladb/scylladb: test: Make fake sstables implicitly belong to current shard test: Make it clearer that sstables::test::set_values() modify data size	2022-09-20 06:14:07 +03:00
Nadav Har'El	a1ff865c77	Merge 'test/topology_raft_disabled: write basic raft upgrade test' from Kamil Braun The test changes the servers' configuration to include `raft` in the `experimental-features` list, then restarts them. It waits until driver reconnects to every server after restarting. Then it checks that upgrade eventually finishes on every server by querying `group0_upgrade_state` key in `system.scylla_local`. Finally, it performs a schema change and verifies that a corresponding entry has appeared in `system.group0_history`. The commit also increases the number of clusters in the suite cluster pool. Since the suite contains only one test at this time this only has an effect if we run the test multiple times (using `--repeat`). Closes #11563 * github.com:scylladb/scylladb: test/topology_raft_disabled: write basic raft upgrade test test: setup logging in topology suites	2022-09-19 20:27:08 +03:00
Alejo Sanchez	087ae521c5	test.py: make client fail if before test check fails Check if request to server side (test.py) failed and raise if so. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #11575	2022-09-19 18:04:07 +02:00
Raphael S. Carvalho	2f52698a26	test: Make fake sstables implicitly belong to current shard Fake SSTables will be implicitly owned by the shard that created them, allowing them to be called on procedures that assert the SSTables are owned by the current shard, like the table's one that rebuilds the sstable set. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-19 12:05:24 -03:00
Raphael S. Carvalho	697f200319	test: Make it clearer that sstables::test::set_values() modify data size By adding a param with default value, we make it clear in the interface that the procedure modifies sstable data size. It can happen one calls this function without noticing it overrides the data size previously set using a different function. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-19 12:01:24 -03:00
Anna Stuchlik	2513497f9a	doc: add the guide to upgrade ScyllaDB from 5.0 to 5.1	2022-09-19 16:06:24 +02:00
Kamil Braun	b770443300	test/topology_raft_disabled: write basic raft upgrade test The test changes the servers' configuration to include `raft` in the `experimental-features` list, then restarts them. It waits until driver reconnects to every server after restarting. Then it checks that upgrade eventually finishes on every server by querying `group0_upgrade_state` key in `system.scylla_local`. Finally, it performs a schema change and verifies that a corresponding entry has appeared in `system.group0_history`. The commit also increases the number of clusters in the suite cluster pool. Since the suite contains only one test at this time this only has an effect if we run the test multiple times (using `--repeat`).	2022-09-19 13:29:35 +02:00
Kamil Braun	fd986bfed1	test: setup logging in topology suites Make it possible to use logging from within tests in the topology suites. The tests are executed using `pytest`, which uses a `pytest.ini` file for logging configuration. Also cleanup the `pytest.ini` files a bit.	2022-09-19 12:23:11 +02:00
Nadav Har'El	711dcd56b6	docs/alternator: refer to the right issue In compatibility.md where we refer to the missing ability to add a GSI to an existing table - let's refer to a new issue specifically about this feature, instead of the old bigger issue about UpdateItem. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11568	2022-09-19 11:05:07 +03:00
Piotr Sarna	5597bc8573	Merge 'Alternator: test and fix crashes and errors... when using ":attrs" attribute' from Nadav Har'El This PR improves the testing for issue #5009 and fixes most of it (but not all - see below). Issue #5009 is about what happens when a user tries to use the name `:attrs` for an attribute - while Alternator uses a map column with that name to hold all the schema-less attributes of an item. The tests we had for this issue were partial, and missed the worst cases which could result in Scylla crashing on specially-crafted PutItem or UpdateItem requests. What the tests missed were the cases that `:attrs` is used as a non-key. So in this PR we add additional tests for this case, several of them fail or even crash Scylla, and then we fix all these cases. Issue #5009 remains open because using `:attrs` as the name of a key is still not allowed. But because it results in a clean error message when attempting to create a table with such a key, I consider this remaining problem very minor. Refs #5009. Closes #11572 * github.com:scylladb/scylladb: alternator: fix crashes an errors when using ":attrs" attribute alternator: improve tests for reserved attribute name ":attrs"	2022-09-19 09:48:06 +02:00
Nadav Har'El	999ca2d588	alternator: fix crashes an errors when using ":attrs" attribute Alternator uses a single column, a map, with the deliberately strange name ":attrs", to hold all the schema-less attributes of an item. The existing code is buggy when the user tries to write to an attribute with this strange name ":attrs". Although it is extremely unlikely that any user would happen to choose such a name, it is nevertheless a legal attribute name in DynamoDB, and should definitely not cause Scylla to crash as it does in some cases today. The bug was caused by the code assuming that to check whether an attribute is stored in its own column in the schema, we just need to check whether a column with that name exists. This is almost true, except for the name ":attrs" - a column with this name exists, but it is a map - the attribute with that name should be stored in the map, not as the map. The fix is to modify that check to special-case ":attrs". This fix makes the relevant tests, which used to crash or fail, now pass. This fix solves most of #5009, but one point is not yet solved (and perhaps we don't need to solve): It is still not allowed to use the name ":attrs" for a key attribute. But trying to do that fails cleanly (during the table creation) with an appropriate error message, so is only a very minor compatibility issue. Refs #5009 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-19 10:30:11 +03:00
Nadav Har'El	6f8dca3760	alternator: improve tests for reserved attribute name ":attrs" As explained in issue #5009, Alternator currently forbids the special attribute name ":attrs", whereas DynamoDB allows any string of approriate length (including the specific string ":attrs") to be used. We had only a partial test for this incompatibility, and this patch improves the testing of this issue. In particular, we were missing a test for the case that the name ":attrs" was used for a non-key attribute (we only tested the case it was used as a sort key). It turns out that Alternator crashes on the new test, when the test tries to write to a non-key attribute called ":attrs", so we needed to mark the new test with "skip". Moreover, it turns out that different code paths handle the attribute name ":attrs" differently, and also crash or fail in other ways - so we added more than one xfailing and skipped tests that each fails in a different place (and also a few tests that do pass). As usual, the new tests we checked to pass on DynamoDB. Refs #5009 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-19 10:30:06 +03:00
Botond Dénes	3003f1d747	Merge 'alternator: small documentation and comment fixes' from Nadav Har'El This tiny series fixes some small error and out-of-date information in Alternator documentation and code comments. Closes #11547 * github.com:scylladb/scylladb: alternator ttl: comment fixes docs/alternator: fix mention of old alternator-test directory	2022-09-19 09:27:53 +03:00
Kamil Braun	348582c4c8	test/pylib: pool: make it possible to free up space Some tests mark clusters as 'dirty', which makes them non-reusable by later tests; we don't want to return them to the pool of clusters. This use-case was covered by the `add_one` function in the `Pool` class. However, it had the unintended side effect of creating extra clusters even if there were no more tests that were waiting for new clusters. Rewrite the implementation of `Pool` so it provides 3 interface functions: - `get` borrows an object, building it first if necessary - `put` returns a borrowed object - `steal` is called by a borrower to free up space in the pool; the borrower is then responsible for cleaning up the object. Both `put` and `steal` wake up any outstanding `get` calls. Objects are built only in `get`, so no objects are built if none are needed. Closes #11558	2022-09-18 12:05:57 +03:00
Botond Dénes	22128977e4	test/boost: add alternative variant of logalloc test Which intializes LSA with use_standard_allocator_segment_pool_backend() running the logalloc_test suite on the standard allocator segment pool backend. To avoid duplicating the test code, the new test-file pulls in the test code via #include. I'm not proud of it, but it works and we test LSA with both the debug and standard memory segment stores without duplicating code.	2022-09-16 14:57:23 +03:00
Botond Dénes	13ace7a05e	Merge "Fix RPC sockets configuration wrt topology" from Pavel Emelyanov " Messaging service checks dc/rack of the target node when creating a socket. However, this information is not available for all verbs, in particular gossiper uses RPC to get topology from other nodes. This generates a chicken-and-egg problem -- to create a socket messaging service needs topology information, but in order to get one gossiper needs to create a socket. Other than gossiper, raft starts sending its APPEND_ENTRY messages early enough so that topology info is not avaiable either. The situation is extra-complicated with the fact that sockets are not created for individual verbs. Instead, verbs are groupped into several "indices" and socket is created for it. Thus, the "gossiping" index that includes non-gossiper verbs will create topology-less socket for all verbs in it. Worse -- raft sends messages w/o solicited topology, the corresponding socket is created with the assumption that the peer lives in default dc and rack which doesn't matchthe local nodes' dc/rack and the whole index group gets the "randomly" configured socket. Also, the tcp-nodelay tries to implement similar check, but uses wrong index of 1, so it's also fixed here. " * 'br-messaging-topology-ignoring-clients' of https://github.com/xemul/scylla: messaging_service: Fix gossiper verb group messaging_service: Mind the absence of topology data when creating sockets messaging_service: Templatize and rename remove_rpc_client_one	2022-09-16 13:27:56 +03:00
Botond Dénes	6a0db84706	tools: use standard allocator Use the new seastar option to instruct seastar to not initialize and use the seastar allocator, relying on the standard allocator instead. Configure LSA with the standard allocator based segment store backend: * scylla-types reserves 1MB for LSA -- in theory nothing here should use LSA, but just in case... * scylla-sstable reserves 100MB for LSA, to avoid excessive trashing in the sstable index caches. With this, tools now should allocate memory on demand, without reserving a large chunk of (or all of) the available memory, as regular seastar apps do.	2022-09-16 13:07:01 +03:00
Botond Dénes	a55903c839	utils/logalloc: add use_standard_allocator_segment_pool_backend() Creating a standard-memory-allocator backend for the segment store. This is targeted towards tools, which want to configure LSA with a segment store backend that is appropriate for the standard allocator (which they want to use). We want to be able to use this in both release and debug mode. The former will be used by tools and the latter will be used to run the logalloc tests with this new backend, making sure it works and doesn't regress. For this latter, we have to allow the release and debug stores to coexist in the same build and for the debug store to be able to delegate to the release store when the standard allocator backend is used.	2022-09-16 13:02:40 +03:00
Kamil Braun	595472ac59	Merge 'Don't use qctx in CDC tables quering' from Pavel Emelyanov There's a bunch of helpers for CDC gen service in db/system_keyspace.cc. All are static and use global qctx to make queries. Fortunately, both callers -- storage_service and cdc_generation_service -- already have local system_keyspace references and can call the methods via it, thus reducing the global qctx usage. Closes #11557 * github.com:scylladb/scylladb: system_keyspace: De-static get_cdc_generation_id() system_keyspace: De-static cdc_is_rewritten() system_keyspace: De-static cdc_set_rewritten() system_keyspace: De-static update_cdc_generation_id()	2022-09-16 11:52:01 +02:00
Kamil Braun	0a6f601996	Merge 'Raft test topology fix request paths and API response handling' from Alecco - Raise on response not HTTP 200 for `.get_text()` helper - Fix API paths - Close and start a fresh driver when restarting a server and it's the only server in the cluster - Fix stop/restart response as text instead of inspecting (errors are status 500 and raise exceptions) Closes #11496 * github.com:scylladb/scylladb: test.py: handle duplicate result from driver test.py: log server restarts for topology tests test.py: log actions for topology tests Revert "test.py: restart stopped servers before... test.py: ManagerClient API fix return text test.py: ManagerClient raise on HTTP != 200 test.py: ManagerClient fix paths to updated resource	2022-09-16 11:29:10 +02:00
Botond Dénes	c1c74005b7	utils/logalloc: introduce segment store backend for standard allocator To be used by tools, this store backend is compatible with the standard allocator as it acquires the memory arena for segments via mmap().	2022-09-16 12:16:57 +03:00
Botond Dénes	d2a7ebbe66	utils/logalloc: rebase release segment-store on segment-store-backend Rebase the seastar allocator based segment store implementation on the recently introduced segment store backend which is now abstracts away how memory for segments is obtained. This patch also introduces an explicit `segment_npos` to be used for cases when a segment -> index mapping fails (segment doesn't belong to the store). Currently the seastar allocator based store simply doesn't handle this case, while the standard allocator based store uses 0 as the implicit invalid index.	2022-09-16 12:16:57 +03:00
Botond Dénes	3717f7740d	utils/logalloc: introduce segment_store_backend We want to make it possible to select the segment-store to be used for LSA -- the seastar allocator based one or the standard allocator based on -- at runtime. Currently this choice is made at compile time via preprocessor switches. The current standard memory based store is specialized for debug build, we want something more similar to the seastar standard memory allocator based one. So we introduce a segment store backend for the current seastar allocator based store, which abstracts how the backing memory for all segments is allocated/freed, while keeping the segment <-> index mapping common. In the next patches we will rebase the current seastar allocator based segment store on this backend and later introduce another backend for standard allocator, targeted for release builds.	2022-09-16 12:16:57 +03:00
Botond Dénes	5ea4d7fb39	utils/logalloc: push segment alloc/dealloc to segment_store Currently the actual alloc/dealloc of memory for segments is located outside the segment stores. We want to abstract away how segments are allocated, so we move this logic too into the segment store. For now this results in duplicate code in the two segment store implementations, but this will soon be gone.	2022-09-16 12:16:57 +03:00
Botond Dénes	e82ea2f3ad	test/boost/logalloc_test: make test_compaction_with_multiple_regions exception-safe Said test creates two vectors, the vector storage being allocated with the default allocator, while its content being allocated on LSA. If an exception is thrown however, both are freed via the default allocator, triggering an assert in LSA code. Move the cleanup into a `defer()` so the correct cleanup sequence is executed even on exceptions.	2022-09-16 12:16:57 +03:00
Pavel Emelyanov	e221bb0112	system_keyspace: De-static get_cdc_generation_id() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-16 08:34:15 +03:00
Pavel Emelyanov	fe48b66c0a	cross-shard-barrier: Capture shared barrier in complete When cross-shard barrier is abort()-ed it spawns a background fiber that will wake-up other shards (if they are sleeping) with exception. This fiber is implicitly waited by the owning sharded service .stop, because barrier usage is like this: sharded<service> s; co_await s.invoke_on_all([] { ... barrier.abort(); }); ... co_await s.stop(); If abort happens, the invoke_on_all() will only resolve _after_ it queues up the waking lambdas into smp queues, thus the subseqent stop will queue its stopping lambdas after barrier's ones. However, in debug mode the queue can be shuffled, so the owning service can suddenly be freed from under the barrier's feet causing use after free. Fortunately, this can be easily fixed by capturing the shared pointer on the shared barrier instead of a regular pointer on the shard-local barrier. fixes: #11303 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11553	2022-09-16 08:21:02 +03:00
Michał Chojnowski	78850884d2	test: perf: perf_fast_forward: fix an error message The test is supposed to give a helpful error message when the user forgets to run --populate before the benchmark. But this must have become broken at some point, because execute_cql() terminates the program with an unhelpful ("unconfigured table config") message, which doesn't mention --populate. Fix that by catching the exception and adding the helpful tip. Closes #11533	2022-09-15 19:30:10 +02:00
Avi Kivity	d3b8c0c8a6	logalloc: don't crash while reporting reclaim stalls if --abort-on-seastar-bad-alloc is specified The logger is proof against allocation failures, except if --abort-on-seastar-bad-alloc is specified. If it is, it will crash. The reclaim stall report is likely to be called in low memory conditions (reclaim's job is to alleviate these conditions after all), so we're likely to crash here if we're reclaiming a very low memory condition and have a large stall simultaneously (AND we're running in a debug environment). Prevent all this by disabling --abort-on-seastar-bad-alloc temporarily. Fixes #11549 Closes #11555	2022-09-15 19:24:39 +02:00
Pavel Emelyanov	4f67898e7b	system_keyspace: De-static cdc_is_rewritten() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-15 18:44:59 +03:00
Pavel Emelyanov	736021ee98	system_keyspace: De-static cdc_set_rewritten() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-15 18:44:53 +03:00
Pavel Emelyanov	b3d139bbdb	system_keyspace: De-static update_cdc_generation_id() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-15 18:44:40 +03:00
Michał Chojnowski	cdb3e71045	sstables: add a flag for disabling long-term index caching Long-term index caching in the global cache, as introduced in 4.6, is a major pessimization for workloads where accesses to the index are (spacially) sparse. We want to have a way to disable it for the affected workloads. There is already infrastructure in place for disabling it for BYPASS CACHE queries. One way of solving the issue is hijacking that infrastructure. This patch adds a global flag (and a corresponding CLI option) which controls index caching. Setting the flag to `false` causes all index reads to behave like they would in BYPASS CACHE queries. Consequences of this choice: - The per-SSTable partition_index_cache is unused. Every index_reader has its own, and they die together. Independent reads can no longer reuse the work of other reads which hit the same index pages. This is not crucial, since partition accesses have no (natural) spatial locality. Note that the original reason for partition_index_cache -- the ability to share reads for the lower and upper bound of the query -- is unaffected. - The per-SSTable cached_file is unused. Every index_reader has its own (uncached) input stream from the index file, and every bsearch_clustered_cursor has its own cached_file, which dies together with the cursor. Note that the cursor still can perform its binary search with caching. However, it won't be able to reuse the file pages read by index_reader. In particular, if the promoted index is small, and fits inside the same file page as its index_entry, that page will be re-read. It can also happen that index_reader will read the same index file page multiple times. When the summary is so dense that multiple index pages fit in one index file page, advancing the upper bound, which reads the next index page, will read the same index file page. Since summary:disk ratio is 1:2000, this is expected to happen for partitions with size greater than 2000 partition keys. Fixes #11202	2022-09-15 17:16:26 +03:00
David Garcia	3cc80da6af	docs: update theme 1.3 Update conf.py Closes #11330	2022-09-15 16:56:41 +03:00
Anna Stuchlik	e5c9f3c8a2	doc: fix the filename in the index to resolve the warnings and fix the link	2022-09-15 15:53:23 +02:00
Anna Stuchlik	338b45303a	doc: apply feedback by adding she step fo load the new repo and fixing the links	2022-09-15 15:40:20 +02:00
Alejo Sanchez	92129f1d47	test.py: handle duplicate result from driver Sometimes the driver calls twice the callback on ready done future with a None result. Log it and avoid setting the local future twice. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-09-15 15:12:50 +02:00
Alejo Sanchez	2da7304696	test.py: log server restarts for topology tests Add missing logging for server restart. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-09-15 15:10:29 +02:00
Alejo Sanchez	61a92afa2d	test.py: log actions for topology tests For debugging, log driver connection, before and after checks, and topology changes. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-09-15 15:10:29 +02:00
Botond Dénes	05ef13a627	Merge 'Add support to split large partitions across SSTables' from Raphael "Raph" Carvalho Introduces support to split large partitions during compaction. Today, compaction can only split input data at partition boundary, so a large partition is stored in a single file. But that can cause many problems, like memory pressure (e.g.: https://github.com/scylladb/scylladb/issues/4217), and incremental compaction can also not fulfill its promise as the file storing the large partition can only be released once exhausted. The first step was to add clustering range metadata for first and last partition keys (retrieved from promoted index), which is crucial to determine disjointness at clustering level, and also the order at which the disjoint files should be opened for incremental reading. The second step was to extend sstable_run to look at clustering dimension, so a set of files storing disjoint ranges for the same partition can live in the same sstable run. The final step was to introduce the option for compaction to split large partition being written if it has exceeded the size threshold. What's next? Following this series, a reader will be implemented for sstable_run that will incrementally open the readers. It can be safely built on the assumption of the disjoint invariant after the second step aforementioned. Closes #11233 * github.com:scylladb/scylladb: test: Add test for large partition splitting on compaction compaction: Add support to split large partitions sstable: Extend sstable_run to allow disjointness on the clustering level sstables: simplify will_introduce_overlapping() test: move sstable_run_disjoint_invariant_test into sstable_datafile_test test: lib: Fix inefficient merging of mutations in make_sstable_containing() sstables: Keep track of first partition's first pos and last partition's last pos sstables: Rename min/max position_range to a descriptive name sstables_manager: Add sstable metadata reader concurrency semaphore sstables: Add ability to find first or last position in a partition	2022-09-15 16:08:56 +03:00
Alejo Sanchez	604f7353ef	Revert "test.py: restart stopped servers before... teardown..." This reverts commit `df1ca57fda`. In order to prevent timeouts on teardown queries, the previous commit added functionality to restart servers that were down. This issue is fixed in fc0263fc9b so there's no longer need to restart stopped servers on test teardown.	2022-09-15 14:47:01 +02:00
Alejo Sanchez	ed81f1a85c	test.py: ManagerClient API fix return text For ManagerClient request API, don't return status, raise an exception. Server side errors are signaled by status 500, not text body. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-09-15 14:47:01 +02:00
Alejo Sanchez	4a5f2418ec	test.py: ManagerClient raise on HTTP != 200 Raise an exception if the request result is not HTTP 200 for .get() helper. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-09-15 14:47:01 +02:00
Alejo Sanchez	a84bde38c0	test.py: ManagerClient fix paths to updated resource Fix missing path renames for server-side rename "node" -> "server" API. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-09-15 14:47:01 +02:00
Kamil Braun	728161003a	Merge 'raft server, abort on background errors' from Gusev Petr Halted background fibers render raft server effectively unusable, so report this explicitly to the clients. Fix: #11352 Closes #11370 * github.com:scylladb/scylladb: raft server, status metric raft server, abort group0 server on background errors raft server, provide a callback to handle background errors raft server, check aborted state on public server public api's	2022-09-15 14:12:11 +02:00
Alejo Sanchez	b8f68729b0	test.py: Pool add fresh when item not returned Pool.get() might have waiting callers, so if an item is not returned to the pool after use, tell the pool to add a new one and tell the pool an entry was taken (used for total running entries, i.e. clusters). Use it when a ScyllaCluster is dirty and not returned. While there improve logging and docstrings. Issue reported by @kbr-. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #11546	2022-09-15 13:56:44 +03:00
Pavel Emelyanov	82162be1f1	messaging_service: Remove init/uninit helpers These two are just getting in the way when touching inter-components dependencies around messaging service. Without it m.-s. start/stop just looks like any other service out there Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11535	2022-09-15 11:54:46 +03:00
Raphael S. Carvalho	0a8afe18ca	cql: Reject create and alter table with DateTieredCompactionStrategy It's been ~1 year (`2bf47c902e`) since we set restrict_dtcs config option to WARN, meaning users have been warned about the deprecation process of DTCS. Let's set the config to TRUE, meaning that create and alter statements specifying DTCS will be rejected at the CQL level. Existing tables will still be supported. But the next step will be about throwing DTCS code into the shadow realm, and after that, Scylla will automatically fallback to STCS (or ICS) for users which ignored the deprecation process. Refs #8914. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #11458	2022-09-15 11:46:18 +03:00
Alejo Sanchez	7e3389ee43	test.py: schema timeout less than request timeout When a server is down, the driver expects multiple schema timeouts within the same request to handle it properly. Found by @kbr- Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #11544	2022-09-15 11:43:52 +03:00
Raphael S. Carvalho	a04047f390	compaction: Properly handle stop request for off-strategy If user stops off-strategy via API, compaction manager can decide to give up on it completely, so data will sit unreshaped in maintenance set, preventing it from being compacted with data in the main set. That's problematic because it will probably lead to a significant increase in read and space amplification until off-strategy is triggered again, which cannot happen anytime soon. Let's handle it by moving data in maintenance set into main one, even if unreshaped. Then regular compaction will be able to continue from where off-strategy left off. Fixes #11543. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #11545	2022-09-15 09:21:22 +03:00
Nadav Har'El	33e6a88d9a	alternator ttl: comment fixes This patch fixes a few errors and out-of-date descriptions in comments in alternator/ttl.cc. No functional changes. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-15 00:03:43 +03:00
Nadav Har'El	8af9437508	docs/alternator: fix mention of old alternator-test directory The directory that used to be called alternator-test is now (and has been for a long time) really test/alternator. So let's fix the references to it in docs/alternator/alternator.md. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-15 00:03:43 +03:00
Pavel Emelyanov	2c74062962	messaging_service: Fix gossiper verb group When configuring tcp-nodelay unconditionally, messaging service thinks gossiper uses group index 1, though it had changed some time ago and now those verbs belong to group 0. fixes: #11465 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-14 20:40:47 +03:00
Pavel Emelyanov	7bdad47de2	messaging_service: Mind the absence of topology data when creating sockets When a socket is created to serve a verb there may be no topology information regarding the target node. In this case current code configures socket as if the peer node lived in "default" dc and rack of the same name. If topology information appears later, the client is not re-connected, even though it could providing more relevant configuration (e.g. -- w/o encryption) This patch checks if the topology info is needed (sometimes it's not) and if missing it configures the socket in the most restrictive manner, but notes that the socket ignored the topology on creation. When topology info appears -- and this happens when a node joins the cluster -- the messaging service is kicked to drop all sockets that ignored the topology, so thay they reconnect later. The mentioned "kick" comes from storage service on-join notification. More correct fix would be if topology had on-change notification and messaging service subscribed on it, but there are two cons: - currently dc/rack do not change on the fly (though they can, e.g. if gossiping property file snitch is updated without restart) and topology update effectively comes from a single place - updating topology on token-metadata is not like topology.update() call. Instead, a clone of token metadata is created, then update happens on the clone, then the clone is committed into t.m. Though it's possible to find out commit-time which nodes changed their topology, but since it only happens on join this complexity likely doesn't worth the effort (yet) fixes: #11514 fixes: #11492 fixes: #11483 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-14 20:30:51 +03:00
Pavel Emelyanov	5ffc9d66ec	messaging_service: Templatize and rename remove_rpc_client_one It actually finds and removes a client and in its new form it also applies filtering function it, so some better name is called for Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-14 20:30:07 +03:00
Raphael S. Carvalho	20a6483678	test: Add test for large partition splitting on compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:23:19 -03:00
Raphael S. Carvalho	e2ccafbe38	compaction: Add support to split large partitions Adds support for splitting large partitions during compaction. Large partitions introduce many problems, like memory overhead and breaks incremental compaction promise. We want to split large partitions across fixed-size fragments. We'll allow a partition to exceed size limit by 10%, as we don't want to unnecessarily split partitions that just crossed the limit boundary. To avoid having to open a minimal of 2 fragments in a read, partition tombstone will be replicated to every fragment storing the partition. The splitting isn't enabled by default, and can be used by strategies that are run aware like ICS. LCS still cannot support it as it's still using physical level metadata, not run id. An incremental reader for sstable runs will follow soon. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:23:16 -03:00
Raphael S. Carvalho	4bc24acf81	sstable: Extend sstable_run to allow disjointness on the clustering level After commit `0796b8c97a`, sstable_run won't accept a fragment that introduces key overlapping. But once we split large partitions, fragments in the same run may store disjoint clustering ranges of the same partition. So we're extending sstable_run to look at clustering dimension, so fragments storing disjoint clustering ranges of the same large partition can co-exist in the same run. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:51 -03:00
Raphael S. Carvalho	574e656793	sstables: simplify will_introduce_overlapping() An element S1 is completely ordered before S2, if S1's last key is lower than S2's first key. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:51 -03:00
Raphael S. Carvalho	13942ec947	test: move sstable_run_disjoint_invariant_test into sstable_datafile_test That's where it belongs. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:51 -03:00
Raphael S. Carvalho	e1560c6b7f	test: lib: Fix inefficient merging of mutations in make_sstable_containing() make_sstable_containing() was absurdly slow when merging thousands of mutations belonging to the same key, as it was unnecessarily copying the mutation for every merge, producing bad complexity. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:51 -03:00
Raphael S. Carvalho	5937765009	sstables: Keep track of first partition's first pos and last partition's last pos With first partition's first position and last partition's last partition, we'll be able to determine which fragments composing a sstable run store a large partition that was split. Then sstable run will be able to detect if all fragments storing a given large partition are disjoint in the clustering level. Fixes #10637. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:51 -03:00
Raphael S. Carvalho	a4bbdfcc58	sstables: Rename min/max position_range to a descriptive name The new descriptive name is important to make a distinction when sstable stores position range for first and last rows instead of min and max. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:51 -03:00
Raphael S. Carvalho	e099a9bf3b	sstables_manager: Add sstable metadata reader concurrency semaphore Let's introduce a reader_concurrency_semaphore for reading sstable metadata, to avoid an OOM due to unlimited concurrency. The concurrency on startup is not controlled, so it's important to enforce a limit on the amount of memory used by the parallel readers. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:51 -03:00
Raphael S. Carvalho	9bcad9ffa8	sstables: Add ability to find first or last position in a partition This new method allows sstable to load the first row of the first partition and last row of last partition. That's useful for incremental reading of sstable run which will be split at clustering boundary. To get the first row, it consumes the first row (which can be either a clustering row or range tombstone change) and returns its position_in_partition. To get the last row, it does the same as above but in reverse mode instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-14 13:09:48 -03:00
Nadav Har'El	77467bcbcd	Merge 'test/pylib: APIs to read and modify configuration from tests' from Kamil Braun We introduce `server_get_config` to fetch the entire configuration dict and `update_config` to update a value under the given key. Closes #11493 * github.com:scylladb/scylladb: test/pylib: APIs to read and modify configuration from tests test/pylib: ScyllaServer: extract _write_config_file function test/pylib: ScyllaCluster: extend ActionReturn with dict data test/pylib: ManagerClient: introduce _put_json test/pylib: ManagerClient: replace `_request` with `_get`, `_get_text` test: pylib: store server configuration in `ScyllaServer`	2022-09-14 18:49:55 +03:00
Kefu Chai	2a74a0086f	docs: fix typos * s/udpates/updates/ * s/opetarional/operational/ Signed-off-by: Kefu Chai <tchaikov@gmail.com> Closes #11541	2022-09-14 17:04:05 +03:00
Kamil Braun	73bf781e17	test/pylib: APIs to read and modify configuration from tests We introduce `server_get_config` to fetch the entire configuration dict and `update_config` to update a value under the given key.	2022-09-14 12:46:41 +02:00
Kamil Braun	1f550428a9	test/pylib: ScyllaServer: extract _write_config_file function For refreshing the on-disk config file with the config stored in dict form in the `self.config` field.	2022-09-14 12:46:41 +02:00
Kamil Braun	52e52e8503	test/pylib: ScyllaCluster: extend ActionReturn with dict data For returning types more complex than text. Also specify a default empty string value for the `msg` field for non-text return values.	2022-09-14 12:46:41 +02:00
Kamil Braun	c9348ae8ea	test/pylib: ManagerClient: introduce _put_json For sending PUT requests to the Manager (such as updating configuration).	2022-09-14 12:46:41 +02:00
Kamil Braun	d81c722476	test/pylib: ManagerClient: replace `_request` with `_get`, `_get_text` `_request` performed a GET request and extracted a text body out of the response. Split it into `_get`, which only performs the request, and `_get_text`, which calls `_get` and extracts the body as text. Also extract a `_resource_uri` function which will be used for other request types.	2022-09-14 12:46:41 +02:00
Kamil Braun	9d39e14518	test: pylib: store server configuration in `ScyllaServer` In following commits we will make this configuration accessible from tests through the Manager (for fetching and updating).	2022-09-14 12:46:41 +02:00
Nadav Har'El	cf30432715	Merge 'test: add a topology suite with Raft disabled' from Kamil Braun Add a suite which is basically equivalent to `topology` except that it doesn't start servers with Raft enabled. The suite will be used to test the Raft upgrade procedure. The suite contains a basic test just to check the suite itself can run; the test will be removed when 'real' tests are added. Closes #11487 * github.com:scylladb/scylladb: test.py: PythonTestSuite: sum default config params with user-provided ones test: add a topology suite with Raft disabled test: pylib: use Python dicts to manipulate `ScyllaServer` configuration test: pylib: store `config_options` in `ScyllaServer`	2022-09-14 13:37:44 +03:00
Pavel Emelyanov	43131976e9	updateable_value: Update comment about cross-shard copying refs: #7316 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11538	2022-09-14 12:35:56 +02:00
Michał Chojnowski	9b6fc553b4	db: commitlog: don't print INFO logs on shutdown The intention was for these logs to be printed during the database shutdown sequence, but it was overlooked that it's not the only place where commitlog::shutdown is called. Commitlogs are started and shut down periodically by hinted handoff. When that happens, these messages spam the log. Fix that by adding INFO commitlog shutdown logs to database::stop, and change the level of the commitlog::shutdown log call to DEBUG. Fixes #11508 Closes #11536	2022-09-14 11:30:53 +03:00
Avi Kivity	a24a8fd595	Update seastar submodule * seastar cbb0e888d8...601e0776c0 (1): > coroutine: explain and mitigate the lambda coroutine fiasco Closes #11537	2022-09-13 22:37:29 +03:00
Petr Gusev	4ff0807cd0	raft server, status metric	2022-09-13 19:34:22 +04:00
Alejo Sanchez	6799e766ca	test.py: topology increment timeouts even more Due to slow debug machines timing out, bump up all timeouts significantly. The cause was ExecutionProfile request_timeout. Also set a high heartbeat timeout and bump already set timeouts to be safe, too. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #11516	2022-09-13 11:57:31 +02:00
Piotr Dulikowski	e69b44a60f	exception: fix the error code used for rate_limit_exception Per-partition rate limiting added a new error type which should be returned when Scylla decides to reject an operation due to per-partition rate limit being exceeded. The new error code requires drivers to negotiate support for it, otherwise Scylla will report the error as `Config_error`. The existing error code override logic works properly, however due to a mistake Scylla will report the `Config_error` code even if the driver correctly negotiated support for it. This commit fixes the problem by specifying the correct error code in `rate_limit_exception`'s constructor. Tested manually with a modified version of the Rust driver which negotiates support for the new error. Additionally, tested what happens when the driver doesn't negotiate support (Scylla properly falls back to `Config_error`). Branches: 5.1 Fixes: #11517 Closes #11518	2022-09-13 11:46:15 +02:00
Nadav Har'El	8ece63c433	Merge 'Safemode - Introduce TimeWindowCompactionStrategy Guardrails' This series introduces two configurable options when working with TWCS tables: - `restrict_twcs_default_ttl` - a LiveUpdate-able tri_mode_restriction which defaults to WARN and will notify the user whenever a TWCS table is created without a `default_time_to_live` setting - `twcs_max_window_count` - Which forbids the user from creating TWCS tables whose window count (buckets) are past a certain threshold. We default to 50, which should be enough for most use cases, and a setting of 0 effectively disables the check. Refs: #6923 Fixes: #9029 Closes #11445 * github.com:scylladb/scylladb: tests: cql_query_test: add mixed tests for verifying TWCS guard rails tests: cql_query_test: add test for TWCS window size tests: cql_query_test: add test for TWCS tables with no TTL defined cql: add configurable restriction of default_time_to_live when for TimeWindowCompactionStrategy tables cql: add max window restriction for TimeWindowCompactionStrategy time_window_compaction_strategy: reject invalid window_sizes cql3 - create/alter_table_statement: Make check_restricted_table_properties accept a schema_ptr	2022-09-12 23:55:51 +03:00
Botond Dénes	045b053228	Update seastar submodule * seastar 2b2f6c08...cbb0e888 (10): > memory: allow user to select allocator to be used at runtime > perftune.py: correct typos > Merge 'seastar-addr2line: support more flexible syslog-style backtraces' from Benny Halevy > Fix instruction count for start_measuring_time > build: s/c-ares::c-ares/c-ares::cares/ > Merge 'shared_ptr_debug_helper: turn assert into on_internal_error_abort' from Benny Halevy > test: fix use after free in the loopback socket > doc/tutorial.md: fix docker command for starting hello-world_demo > httpd: add a ctor without addr parameter > dns: dns_resolver: sock_entry: move-construct tcp/udp entries in place Closes #11526	2022-09-12 18:34:22 +03:00
Avi Kivity	62ac3432c9	Merge "Always notify dropped RPC connections" from Pavel E " This set makes messaging service notify connection drop listeners when connection is dropped for _any_ reason and cleans things up around it afterwards " * 'br-messaging-notify-connection-drop' of https://github.com/xemul/scylla: messaging_service: Relax connection drop on re-caching messaging_service: Simplify remove_rpc_client_one() messaging_service: Notify connection drop when connection is removed	2022-09-12 17:02:51 +03:00
Yaron Kaikov	27e326652b	build_docker.sh:fix python2 dependency Following the revert of `b004da9d1b` which solved https://github.com/scylladb/scylla-pkg/issues/3094 updating docker dependency to match `scylla-tools-java` requirements Closes #11522	2022-09-12 13:33:06 +03:00
Kamil Braun	2fe3e67a47	gms: feature_service: don't distinguish between 'known' and 'supported' features `feature_service` provided two sets of features: `known_feature_set` and `supported_feature_set`. The purpose of both and the distinction between them was unclear and undocumented. The 'supported' features were gossiped by every node. Once a feature is supported by every node in the cluster, it becomes 'enabled'. This means that whatever piece of functionality is covered by the feature, it can by used by the cluster from now on. The 'known' set was used to perform feature checks on node start; if the node saw that a feature is enabled in the cluster, but the node does not 'know' the feature, it would refuse to start. However, if the feature was 'known', but wasn't 'supported', the node would not complain. This means that we could in theory allow the following scenario: 1. all nodes support feature X. 2. X becomes enabled in the cluster. 3. the user changes the configuration of some node so feature X will become unsupported but still known. 4. The node restarts without error. So now we have a feature X which is enabled in the cluster, but not every node supports it. That does not make sense. It is not clear whether it was accidental or purposeful that we used the 'known' set instead of the 'supported' set to perform the feature check. What I think is clear, is that having two sets makes the entire thing unnecessarily complicated and hard to think about. Fortunately, at the base to which this patch is applied, the sets are always the same. So we can easily get rid of one of them. I decided that the name which should stay is 'supported', I think it's more specific than 'known' and it matches the name of the corresponding gossiper application state. Closes #11512	2022-09-12 13:09:12 +03:00
Takuya ASADA	cd5320fe60	install.sh: add --without-systemd option Since we fail to write files to $USER/.config on Jenkins jobs, we need an option to skip installing systemd units. Let's add --without-systemd to do that. Also, to detect the option availability, we need to increment relocatable package version. See scylladb/scylla-dtest#2819 Closes #11345	2022-09-12 13:04:00 +03:00
Avi Kivity	521127a253	Update tools/jmx submodule * tools/jmx 06f2735...88d9bdc (1): > install.sh: add --without-systemd option	2022-09-12 13:02:16 +03:00
Kamil Braun	ce7bb8b6d0	test.py: PythonTestSuite: sum default config params with user-provided ones Previously, if the suite.yaml file provided `extra_scylla_config_options` but didn't provide values for `authorizer` or `authenticator` inside the config options, the harness wouldn't give any defaults for these keys. It would only provide defaults for these keys if suite.yaml didn't specify `extra_scylla_config_options` at all. It makes sense to give the user the ability to provide extra options while relying on harness defaults for `authenticator` and `authorizer` if the user doesn't care about them.	2022-09-12 11:58:05 +02:00
Kamil Braun	1661fe9f37	test: add a topology suite with Raft disabled Add a suite which is basically equivalent to `topology` except that it doesn't start servers with Raft enabled. The suite will be used to test the Raft upgrade procedure. The suite contains a basic test just to check the suite itself can run; the test will be removed when 'real' tests are added.	2022-09-12 11:58:05 +02:00
Kamil Braun	311806244d	test: pylib: use Python dicts to manipulate `ScyllaServer` configuration Previously we used a formattable string to represent the configuration; values in the string were substituted by Python's formatting mechanism and the resulting string was stored to obtain the config file. This approach had some downsides, e.g. it required boilerplate work to extend: to add a new config options, you would have to modify this template string. Instead we can represent the configuration as a Python dictionary. Dicts are easy to manipulate, for example you can sum two dicts; if a key appears in both, the second dict 'wins': ``` {1:1} \| {1:2} == {1:2} ``` This makes the configuration easy to extend without having to write boilerplate: if the user of `ScyllaServer` wants to add or override a config option, they can simply add it to the `config_options` dict and that's it - no need to modify any internal template strings in `ScyllaServer` implementation like before. The `config_options` dict is simply summed with the 'base' config dict of `ScyllaServer` (`config_options` is the right summand so anything in there overrides anything in the base dict). An example of this extensibility is the `authenticator` and `authorizer` options which no longer appear in `scylla_cluster.py` module after this change, they only appear in the suite.yaml file. Also, use "workdir" option instead of specifying data dir, commitlog dir etc. separately.	2022-09-12 11:57:58 +02:00
Kamil Braun	fd19825eaa	test: pylib: store `config_options` in `ScyllaServer` Previously the code extracted `authenticator` and `authorizer` keys from the config options and stored them. Store the entire dict instead. The new code is easier to extend if we want to make more options configurable.	2022-09-12 11:57:18 +02:00
Pavel Emelyanov	5663b3eda5	messaging_service: Relax connection drop on re-caching When messaging_service::get_rpc_client() picks up cached socket and notices error on it, it drops the connection and creates a new one. The method used to drop the connection is the one that re-lookups the verb index again, which is excessive. Tune this up while at it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-12 12:05:02 +03:00
Nadav Har'El	b0371b6bf8	test/alternator: insert test names into Scylla logs The output of test/alternator/run ends in Scylla's full log file, where it is hard to understand which log messages are related to which test. In this patch, we add a log message (using the new /system/log REST API) every time a test is started and ends. The messages look like this: INFO 2022-08-29 18:07:15,926 [shard 0] api - /system/log: test/alternator: Starting test_ttl.py::test_describe_ttl_without_ttl ... INFO 2022-08-29 18:07:15,930 [shard 0] api - /system/log: test/alternator: Ended test_ttl.py::test_describe_ttl_without_ttl Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-12 10:32:56 +03:00
Nadav Har'El	a81310e23d	rest api: add a new /system/log operation Add a new REST API operation, taking a log level and a message, and printing it into the Scylla log. This can be useful when a test wants to mark certain positions in the log (e.g., to see which other log messages we get between the two positions). An alternative way to achieve this could have been for the test to write directly into the log file - but an on-disk log file is only one of the logging options that Scylla support, and the approach in this patch allows to add log message regardless of how Scylla keeps the logs. In motivation of this feature is that in the following patch the test/alternator framework will add log messages when starting and ending tests, which can help debug test failures. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-12 10:32:56 +03:00
Nadav Har'El	b9792ffb06	alternator ttl: log warning if scan took too long. Currently, we log at "info" level how much time remained at the end of a full TTL scan until the next scanning period (we sleep for that time). If the scan was slower than the period, we didn't print anything. Let's print a warning in this case - it can be useful for debugging, and also users should know when their desired scan period is not being honored because the full scan is taking longer than the desired scan period. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-12 10:32:56 +03:00
Nadav Har'El	e7e9adc519	alternator,ttl: allow sub-second TTL scanning period, for tests Alternator has the "alternator_ttl_period_in_seconds" parameter for controlling how often the expiration thread looks for expired items to delete. It is usually a very large number of seconds, but for tests to finish quickly, we set it to 1 second. With 1 second expiration latency, test/alternator/test_ttl.py took 5 seconds to run. In this patch, we change the parameter to allow a floating-point number of seconds instead of just an integer. Then, this allows us to halve the TTL period used by tests to 0.5 seconds, and as a result, the run time of test_ttl.py halves to 2.5 seconds. I think this is fast enough for now. I verified that even if I change the period to 0.1, there is no noticable slowdown to other Alternator tests, so 0.5 is definitely safe. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-12 10:32:56 +03:00
Nadav Har'El	746c4bd9eb	test/alternator: skip fewer Alternator TTL tests Most of the Alternator TTL tests are extremely slow on DynamoDB because item expiration may be delayed up to 24 hours (!), and in practice for 10 to 30 minutes. Because of this, we marked most of these tests with the "veryslow" mark, causing them to be skipped by default - unless pytest is given the "--runveryslow" option. The result was that the TTL tests were not run in the normal test runs, which can allow regressions to be introduced (luckily, this hasn't happened). However, this "veryslow" mark was excessive. Many of the tests are very slow only on DynamoDB, but aren't very slow on Scylla. In particular, many of the tests involve waiting for an item to expire, something that happens after the configurable alternator_ttl_period_in_seconds, which is just one second in our tests. So in this patch, we remove the "veryslow" mark from 6 tests of Alternator TTL tests, and instead use two new fixtures - waits_for_expiration and veryslow_on_aws - to only skip the test when running on DynamoDB or when alternator_ttl_period_in_seconds is high - but in our usual test environment they will not get skipped. Because 5 of these 6 tests wait for an item to expire, they take one second each and this patch adds 5 seconds to the Alternator test runtime. This is unfortunate (it's more than 25% of the total Alternator test runtime!) but not a disaster, and we plan to reduce this 5 second time futher in the following patch, but decreasing the TTL scanning period even further. This patch also increases the timeout of several of these tests, to 120 seconds from the previous 10 seconds. As mentioned above, normally, these tests should always finish in alternator_ttl_period_in_seconds (1 second) with a single scan taking less than 0.2 seconds, but in extreme cases of debug builds on overloaded test machines, we saw even 60 seconds being passed, so let's increase the maximum. I also needed to make the sleep time between retries smaller, not a function of the new (unrealistic) timeout. 4 more tests remain "veryslow" (and won't run by default) because they are take 5-10 seconds each (e.g., a test which waits to see that an item does not get expired, and a test involving writing a lot of data). We should reconsider this in the future - to perhaps run these tests in our normal test runs - but even for now, the 6 extra tests that we start running are a much better protection against regressions than what we had until now. Fixes #11374 Signed-off-by: Nadav Har'El <nyh@scylladb.com> x Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-12 10:32:56 +03:00
Nadav Har'El	297109f6ee	test/alternator: test Alternator TTL metrics This patch adds a test for the metrics generated by the background expiration thread run for Alternator's TTL feature. We test three of the four metrics: scylla_expiration_scan_passes, scylla_expiration_scan_table and scylla_expiration_items_deleted. The fourth metric, scylla_expiration_secondary_ranges_scanned, counts the number of times that this node took over another node's expiration duty. so requires a multi-node cluster to test, and we can't test it in the single-node cluster test framework. To see TTL expiration in action this test may need to wait up to the setting of alternator_ttl_period_in_seconds. For a setting of 1 second (the default set by test/alternator/run), this means this test can take up to 1 second to run. If alternator_ttl_period_in_seconds is set higher, the test is skipped unless --runveryslow is requested. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-12 10:32:56 +03:00
Botond Dénes	a0392bc1eb	Merge 'doc: update the default SStable format' from Anna Stuchlik The purpose of this PR is to update the information about the default SStable format. It Closes #11431 * github.com:scylladb/scylladb: doc: simplify the information about default formats in different versions doc: update the SSTables 3.0 Statistics File Format to add the UUID host_id option of the ME format doc: add the information regarding the ME format to the SSTables 3.0 Data File Format page doc: fix additional information regarding the ME format on the SStable 3.x page doc: add the ME format to the table add a comment to remove the information when the documentation is versioned (in 5.1) doc: replace Scylla with ScyllaDB doc: fix the formatting and language in the updated section doc: fix the default SStable format	2022-09-12 09:50:01 +03:00
Pavel Emelyanov	f3dfc9dbd4	system_keyspace: Don't load preferred IPs if not asked for If snitch->prefer_local() is false, advertised (via gossiper) INTERNAL_IPs are not suggested to messaging service to use. The same should apply to boot-time when messaging service is loaded with those IPs taken from the system.peers table. fixes: #11353 tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/2172/ Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220909144800.23122-1-xemul@scylladb.com>	2022-09-12 09:48:23 +03:00
Botond Dénes	9db940ff1b	Merge "Make network_topology_strategy_test use topology" from Pavel Emelyanov " The test in question plays with snitches to simulate the topology over which tokens are spread. This set replaces explicit snitch usage with temporary topology object. Some snitch traces are still left, but those are for token_metadata internal which still call global snitch for DC/RACK. " * 'br-tests-use-topology-not-snitch' of https://github.com/xemul/scylla: network_topology_strategy_test: Use topology instead of snitch network_topology_strategy_test: Populate explicit topology	2022-09-12 09:40:17 +03:00
Avi Kivity	6c797587c7	dirty_memory_manager: region_group: remove sorting of subgroups dirty_memory_manager tracks lsa regions (memtables) under region_group:s, in order to be able to pick up the largest memtable as a candidate for flushing. Just as region_group:s contain regions, they can also contain other region_group:s in a nested structure. It also tracks the nested region_group that contains the largest region in a binomial heap. This latter facility is no longer used. It saw use when we had the system dirty_memory_manager nested under the user dirty_memory_manager, but that proved too complicated so it was undone. We still nest a virtual region_group under the real region_group, and in fact it is the virtual region_group that holds the memtables, but it is accessed directly to find the largest memtable (region_group::get_largest_region) and so all the mechanism that sorts region_group:s is bypassed. Start to dismantle this house of cards by removing the subgroup sorting. Since the hierarchy has exactly one parent and one child, it's clearly useless. This is seen by the fact that we can just remove everything related. We still need the _subgroups member to hold the virtual region_group; it's replaced by a vector. I verified that the non-intrusive vector is exception safe since push_back() happens at the very end; in any case this is early during setup where we aren't under memory pressure. A few tests that check the removed functionality are deleted. Closes #11515	2022-09-12 09:29:08 +03:00
Botond Dénes	0e2d6cfd61	Merge 'Introduce Compaction Groups' from Raphael "Raph" Carvalho Compaction group can be defined as a set of files that can be compacted together. Today, all sstables belonging to a table in a given shard belong to the same group. So we can say there's one group per table per shard. As we want to eventually allow isolation of data that shouldn't be mixed, e.g. data from different vnodes, then we want to have more than one group per table per shard. That's why compaction groups is being introduced here. Today, all memtables and sstables are stored in a single structure per table. After compaction groups, there will be memtables and sstables for each group in the table. As we're taking an incremental approach, table still supports a single group. But work was done on preparing table for supporting multiple groups. Completing that work is actually the next step. Also, a procedure for deriving the group from token is introduced, but today it always return the single group owned by the table. Once multiple groups are supported, then that procedure should be implemented to map a token to a group. No semantics was changed by this series. Closes #11261 * github.com:scylladb/scylladb: replica: Move memtables to compaction_group replica: move compound SSTable set to compaction group replica: move maintenance SSTable set to compaction_group replica: move main SSTable set to compaction_group replica: Introduce compaction_group replica: convert table::stop() into coroutine compaction_manager: restore indentation compaction_manager: Make remove() and stop_ongoing_compactions() noexcept test: sstable_compaction_test: Don't reference main sstable set directly test: sstable_utils: Set data size fields for fake SSTable test: sstable_compaction_test: remove needless usage of column_family_test::add_sstable	2022-09-12 09:28:44 +03:00
Botond Dénes	5374f0edbf	Merge 'Task manager' from Aleksandra Martyniuk Task manager for observing and managing long-running, asynchronous tasks in Scylla with the interface for the user. It will allow listing of tasks, getting detailed task status and progression, waiting for their completion, and aborting them. The task manager will be configured with a “task ttl” that determines how long the task status is kept in memory after the task completes. At first it will support repair and compaction tasks, and possibly more in the future. Currently: Sharded `task_manager` is started in `main.cc` where it is further passed to `http_context` for the purpose of user interface. Task manager's tasks are implemented in two two layers: the abstract and the implementation one. The latter is a pure virtual class which needs to be overriden by each module. Abstract layer provides the methods that are shared by all modules and the access to module-specific methods. Each module can access task manager, create and manage its tasks through `task_manager::module` object. This way data specific to a module can be separated from the other modules. User can access task manager rest api interface to track asynchronous tasks. The available options consist of: - getting a list of modules - getting a list of basic stats of all tasks in the requested module - getting the detailed status of the requested task - aborting the requested task - waiting for the requested task to finish To enable testing of the provided api, test specific task implementation and module are provided. Their lifetime can be simulated with the standalone test api. These components are compiled and the tests are run in all but release build modes. Fixes: #9809 Closes #11216 * github.com:scylladb/scylladb: test: task manager api test task_manager: test api layer implementation task_manager: add test specific classes task_manager: test api layer task_manager: api layer implementation task_manager: api layer task_manager: keep task_manager reference in http_context start sharded task manager task_manager: create task manager object	2022-09-12 09:26:46 +03:00
Petr Gusev	1b5fa4088e	raft server, abort group0 server on background errors	2022-09-12 10:16:43 +04:00
Petr Gusev	e92dc9c15b	raft server, provide a callback to handle background errors Fix: #11352	2022-09-12 10:16:43 +04:00
Petr Gusev	c57238d3d6	raft server, check aborted state on public server public api's Fix: #11352	2022-09-12 10:16:40 +04:00
Felipe Mendes	6a3d8607b4	tests: cql_query_test: add mixed tests for verifying TWCS guard rails This patch adds set of 10 cenarios that have been unveiled during additional testing. In particular, most of the scenarios cover ALTER TABLE statements, which - if not handled - may break the guardrails safe-mode. The situations covered are: - STCS->TWCS with no TTL defined - STCS->TWCS with small TTL - STCS->TWCS with large TTL value - TWCS table with small to large TTL - No TTL TWCS to large TTL and then small TTL - twcs_max_window_count LiveUpdate - Decrease TTL - twcs_max_window_count LiveUpdate - Switch CompactionStrategy - No TTL TWCS table to STCS - Large TTL TWCS table, modify attribute other than compaction and default_time_to_live - Large TTL STCS table, fail to switch to TWCS with no TTL explicitly defined	2022-09-11 17:57:14 -03:00
Felipe Mendes	a7a91e3216	tests: cql_query_test: add test for TWCS window size This patch adds a test for checking the validity of tables using TimeWindowCompactionStrategy with an incorrect number of compaction windows. The twcs_max_window_count LiveUpdate-able parameter is also disabled during the execution of the test in order to ensure that users can effectively disable the enforcement, should they want.	2022-09-11 17:38:25 -03:00
Felipe Mendes	1c5d46877e	tests: cql_query_test: add test for TWCS tables with no TTL defined This patch adds a testcase for TimeWindowCompactionStrategy tables created with no default_time_to_live defined. It makes use of the LiveUpdate-able restrict_twcs_default_ttl parameter in order to determine whether TWCS tables without TTL should be forbidden or not. The test replays all 3 possible variations of the tri_mode_restriction and verifies tables are correctly created/altered according to the current setting on the replica which receives the request.	2022-09-11 16:55:46 -03:00
Felipe Mendes	7fec4fcaa6	cql: add configurable restriction of default_time_to_live when for TimeWindowCompactionStrategy tables TimeWindowCompactionStrategy (TWCS) tables are known for being used explicitly for time-series workloads. In particular, most of the time users should specify a default_time_to_live during table creation to ensure data is expired such as in a sliding window. Failure to do so may create unbounded windows - which - depending on the compaction window chosen, may introduce severe latency and operational problems, due to unbounded window growth. However, there may be some use cases which explicitly ingest data by using the `USING TTL` keyword, which effectively has the same effect. Therefore, we can not simply forbid table creations without a default_time_to_live explicitly set to any value other than 0. The new restrict_twcs_without_default_ttl option has three values: "true", "false", and "warn": We default to "warn", which will notify the user of the consequences when creating a TWCS table without a default_time_to_live value set. However, users are encouraged to switch it to "true", as - ideally - a default_time_to_live value should always be expected to prevent applications failing to ingest data against the database ommitting the `USING TTL` keyword.	2022-09-11 16:50:42 -03:00
Felipe Mendes	a3356e866b	cql: add max window restriction for TimeWindowCompactionStrategy The number of potential compaction windows (or buckets) is defined by the default_time_to_live / sstable_window_size ratio. Every now and then we end up in a situation on where users of TWCS end up underestimating their window buckets when using TWCS. Unfortunately, scenarios on which one employs a default_time_to_live setting of 1 year but a window size of 30 minutes are not rare enough. Such configuration is known to only make harm to a workload: As more and more windows are created, the number of SSTables will grow in the same pace, and the situation will only get worse as the number of shards increase. This commit introduces the twcs_max_window_count option, which defaults to 50, and will forbid the Creation or Alter of tables which get past this threshold. A value of 0 will explicitly skip this check. Note: this option does not forbid the creation of tables with a default_time_to_live=0 as - even though not recommended - it is perfectly possible for a TWCS table with default TTL=0 to have a bound window, provided any ingestion statements make use of 'USING TTL' within the CQL statement, in addition to it.	2022-09-11 16:50:22 -03:00
Felipe Mendes	f1ffb501f0	time_window_compaction_strategy: reject invalid window_sizes Scylla mistakenly allows an user to configure an invalid TWCS window_size <= 0, which effectively breaks the notion of compaction windows. Interestingly enough, a <= 0 window size should be considered an undefined behavior as either we would create a new window every 0 duration (?) or the table would behave as STCS, the reader is encouraged to figure out which one of these is true. :-) Cassandra, on the other hand, will properly throw a ConfigurationException when receiving such invalid window sizes and we now match the behavior to the same as Cassandra's. Refs: #2336	2022-09-11 16:40:03 -03:00
Raphael S. Carvalho	f5715d3f0b	replica: Move memtables to compaction_group Now memtables live in compaction_group. Also introduced function that selects group based on token, but today table always return the single group managed by it. Once multiple groups are supported, then the function should interpret token content to select the group. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Raphael S. Carvalho	f4579795e6	replica: move compound SSTable set to compaction group The group is now responsible for providing the compound set. table still has one compound set, which will span all groups for the cases we want to ignore the group isolation. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Raphael S. Carvalho	6717d96684	replica: move maintenance SSTable set to compaction_group This commit is restricted to moving maintenance set into compaction_group. Next, we'll introduce compound set into it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Raphael S. Carvalho	ce8e5f354c	replica: move main SSTable set to compaction_group This commit is restricted to moving main set into compaction_group. Next, we'll move maintenance set into it and finally the memtable. A method is introduced to figure out which group a sstable belongs to, but it's still unimplemented as table is still limited to a single group. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Raphael S. Carvalho	4871f1c97c	replica: Introduce compaction_group Compaction group is a new abstraction used to group SSTables that are eligible to be compacted together. By this definition, a table in a given shard has a single compaction group. The problem with this approach is that data from different vnodes is intermixed in the same sstable, making it hard to move data in a given sstable around. Therefore, we'll want to have multiple groups per table. A group can be thought of an isolated LSM tree where its memtable and sstable files are isolated from other groups. As for the implementation, the idea is to take a very incremental approach. In this commit, we're introducing a single compaction group to table. Next, we'll migrate sstable and maintenance set from table into that single compaction group. And finally, the memtable. Cache will be shared among the groups, for simplicity. It works due to its ability to invalidate a subset of the token range. There will be 1:1 relationship between compaction_group and table_state. We can later rename table_state to compaction_group_state. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Raphael S. Carvalho	a6ecadf3de	replica: convert table::stop() into coroutine await_pending_ops() is today marked noexcept, so doesn't have to be implemented with finally() semantics. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Raphael S. Carvalho	44913ebbd0	compaction_manager: restore indentation Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Raphael S. Carvalho	888660fa44	compaction_manager: Make remove() and stop_ongoing_compactions() noexcept stop_ongoing_compactions() is made noexcept too as it's called from remove() and we want to make the latter noexcept, to allow compaction group to qualify its stop function as noexcept too. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Raphael S. Carvalho	65414e6756	test: sstable_compaction_test: Don't reference main sstable set directly Preparatory change for main sstable set to be moved into compaction group. After that, tests can no longer direct access the main set. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Raphael S. Carvalho	dfa7273127	test: sstable_utils: Set data size fields for fake SSTable So methods that look at data size and require it to be higher than 0 will work on fake SSTables created using set_values(). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Raphael S. Carvalho	4fa8159a13	test: sstable_compaction_test: remove needless usage of column_family_test::add_sstable column_family_test::add_sstable will soon be changed to run in a thread, and it's not needed in this procedure, so let's remove its usage. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-09-11 14:26:59 -03:00
Jadw1	ba461aca8b	cql-pytest: more neutral command in cql_test_connection fixture I found 'use system` to not be neutral enough (e.g. in case of testing describe statement). `BEGIN BATCH APPLY BATCH` sounds better. Closes #11504	2022-09-11 18:49:06 +03:00
Nadav Har'El	d71098a3b8	Update tools/java submodule * tools/java b7a0c5bd31...b004da9d1b (1): > Revert "dist/debian:add python3 as dependency"	2022-09-11 17:45:43 +03:00
Pavel Emelyanov	bbad3eac63	pylib: Cast port number config to int explicitly Otherwise it crashes some python versions. The cast was there before `a2dd64f68f` explicitly dropped one while moving the code between files. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11511	2022-09-09 18:08:08 +02:00
Kamil Braun	be1ef9d2a7	gms: feature_service: remove the USES_RAFT feature It was not and won't be used for anything. Note that the feature was always disabled or masked so no node ever announced it, thus it's safe to get rid of. Closes #11505	2022-09-09 18:05:46 +02:00
Michał Chojnowski	47844689d8	token_metadata: make local_dc_filter a lambda, not a std::function This std::function causes allocations, both on construction and in other operations. This costs ~2200 instructions for a DC-local query. Fix that. Closes #11494	2022-09-09 18:05:46 +02:00
Michał Chojnowski	af7ace3926	utils: config_file: fix handling of workdir,W in the YAML file Option names given in db/config.cc are handled for the command line by passing them to boost::program_options, and by YAML by comparing them with YAML keys. boost::program_options has logic for understanding the long_name,short_name syntax, so for a "workdir,W" option both --workdir and -W worked, as intended. But our YAML config parsing doesn't have this logic and expected "workdir,W" verbatim, which is obviously not intended. Fix that. Fixes #7478 Fixes #9500 Fixes #11503 Closes #11506	2022-09-09 18:05:46 +02:00
Kamil Braun	dba595d347	Merge 'Minimal implementation of Broadcast Tables' from Mikołaj Grzebieluch Broadcast tables are tables for which all statements are strongly consistent (linearizable), replicated to every node in the cluster and available as long as a majority of the cluster is available. If a user wants to store a “small” volume of metadata that is not modified “too often” but provides high resiliency against failures and strong consistency of operations, they can use broadcast tables. The main goal of the broadcast tables project is to solve problems which need to be solved when we eventually implement general-purpose strongly consistent tables: designing the data structure for the Raft command, ensuring that the commands are idempotent, handling snapshots correctly, and so on. In this MVP (Minimum Viable Product), statements are limited to simple SELECT and UPDATE operations on the built-in table. In the future, other statements and data types will be available but with this PR we can already work on features like idempotent commands or snapshotting. Snapshotting is not handled yet which means that restarting a node or performing too many operations (which would cause a snapshot to be created) will give incorrect results. In a follow-up, we plan to add end-to-end Jepsen tests (https://jepsen.io/). With this PR we can already simulate operations on lists and test linearizability in linear complexity. This can also test Scylla's implementation of persistent storage, failure detector, RPC, etc. Design doc: https://docs.google.com/document/d/1m1IW320hXtsGulzSTSHXkfcBKaG5UlsxOpm6LN7vWOc/edit?usp=sharing Closes #11164 * github.com:scylladb/scylladb: raft: broadcast_tables: add broadcast_kv_store test raft: broadcast_tables: add returning query result raft: broadcast_tables: add execution of intermediate language raft: broadcast_tables: add compilation of cql to intermediate language raft: broadcast_tables: add definition of intermediate language db: system_keyspace: add broadcast_kv_store table db: config: add BROADCAST_TABLES feature flag	2022-09-09 18:05:37 +02:00
Aleksandra Martyniuk	55cd8fe3bf	test: task manager api test Test of a task manager api.	2022-09-09 14:29:28 +02:00
Aleksandra Martyniuk	ec86410094	task_manager: test api layer implementation The implementation of a test api that helps testing task manager api. It provides methods to simulate the operations that can happen on modules and theirs task. Through the api user can: register and unregister the test module and the tasks belonging to the module, and finish the tasks with success or custom error.	2022-09-09 14:29:28 +02:00
Aleksandra Martyniuk	b1fa6e49af	task_manager: add test specific classes Add test_module and test_task classes inheriting from respectively task_manager::module and task_manager::task::impl that serve task manager testing.	2022-09-09 14:29:28 +02:00
Aleksandra Martyniuk	42f36db55b	task_manager: test api layer The test api that helps testing task manager api. It can be used to simulate the operations that can happen on modules and theirs task. Through the api user can: register and unregister the test module and the tasks belonging to the module, and finish the tasks with success or custom error.	2022-09-09 14:29:28 +02:00
Aleksandra Martyniuk	c9637705a6	task_manager: api layer implementation The implementation of a task manager api layer. It provides methods to list the modules registered in task_manager, list tasks belonging to the given module, abort, wait for or retrieve a status of the given task.	2022-09-09 14:29:28 +02:00
Aleksandra Martyniuk	07043cee68	task_manager: api layer The task manager api layer. It can be used to list the modules registered in task_manager, list tasks belonging to the given module, abort, wait for or retrieve a status of the given task.	2022-09-09 14:29:28 +02:00
Aleksandra Martyniuk	b87a0a74ab	task_manager: keep task_manager reference in http_context Keep a reference to sharded<task_manager> as a member of http_context so it can be reached from rest api.	2022-09-09 14:29:28 +02:00
Aleksandra Martyniuk	9e68c8d445	start sharded task manager Sharded task manager object is started in main.cc.	2022-09-09 14:29:28 +02:00
Aleksandra Martyniuk	2439e55974	task_manager: create task manager object Implementation of a task manager that allows tracking and managing asynchronous tasks. The tasks are represented by task_manager::task class providing members common to all types of tasks. The methods that differ among tasks of different module can be overriden in a class inheriting from task_manager::task::impl class. Each task stores its status containing parameters like id, sequence number, begin and end time, state etc. After the task finishes, it is kept in memory for configurable time or until it is unregistered. Tasks need to be created with make_task method. Each module is represented by task_manager::module type and should have an access to task manager through task_manager::module methods. That allows to easily separate and collectively manage data belonging to each module.	2022-09-09 14:29:28 +02:00
Pavel Emelyanov	24d68e1995	messaging_service: Simplify remove_rpc_client_one() Make it void as after previous patch no code is interesed in this value Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-09 12:50:41 +03:00
Pavel Emelyanov	ca92ed65e5	messaging_service: Notify connection drop when connection is removed There are two methods to close an RPC socket in m.s. -- one that's called on error path of messaging_service::send_... and the other one that's called upon gossiper down/leave/cql-off notifications. The former one notifies listeners about connection drop, the latter one doesn't. The only listener is the storage-proxy which, in turn, kicks database to release per-table cache hitrate data. Said that, when a node goes down (or when an operator shuts down its transport) the hit-rate stats regarding this node are leaked. This patch moves notification so that any socket drop calls notification and thus releases the hitrates. fixes: #11497 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-09 12:47:38 +03:00
Kamil Braun	0efdc45d59	Merge 'test.py: remove top level conftest and improve logging' from Alecco - To isolate the different pytest suites, remove the top level conftest and move needed contents to existing `test/pylib/cql_repl/conftest.py` and `test/topology/conftest.py`. - Add logging to CQL and Python suites. - Log driver version for CQL and topology tests. Closes #11482 * github.com:scylladb/scylladb: test.py: enable log capture for Python suite test.py: log driver name/version for cql/topology test.py: remove top level conftest.py	2022-09-08 16:25:24 +02:00
Anna Stuchlik	54d6d8b8cc	doc: fix the version name in file upgrade-guide-from-2021.1-to-2022.1-image.rst	2022-09-08 15:38:11 +02:00
Anna Stuchlik	6ccc838740	doc: rename the upgrade-image file to upgrade-image-opensource and update all the links to that file	2022-09-08 15:38:11 +02:00
Anna Stuchlik	22317f8085	doc: update the Enterprise guide to include the Enterprise-onlyimage file	2022-09-08 15:38:11 +02:00
Anna Stuchlik	593f987bb2	doc: update the image files	2022-09-08 15:38:10 +02:00
Anna Stuchlik	42224dd129	doc: split the upgrade-image file to separate files for Open Source and Enterprise	2022-09-08 15:38:10 +02:00
Anna Stuchlik	64a527e1d3	doc: clarify the alternative upgrade procedures for the ScyllaDB image	2022-09-08 15:38:10 +02:00
Anna Stuchlik	5136d7e6d7	doc: add the upgrade guide for ScyllaDB Image from 2022.x.y. to 2022.x.z	2022-09-08 15:38:10 +02:00
Anna Stuchlik	f1ef6a181e	doc: add the upgrade guide for ScyllaDB Image from 5.x.y. to 5.x.z	2022-09-08 15:38:10 +02:00
Mikołaj Grzebieluch	eb610c45fe	raft: broadcast_tables: add broadcast_kv_store test Test queries scylla with following statements: * SELECT value FROM system.broadcast_kv_store WHERE key = CONST; * UPDATE system.broadcast_kv_store SET value = CONST WHERE key = CONST; * UPDATE system.broadcast_kv_store SET value = CONST WHERE key = CONST IF value = CONST; where CONST is string randomly chosen from small set of random strings and half of conditional updates has condition with comparison to last written value.	2022-09-08 15:25:36 +02:00
Mikołaj Grzebieluch	803115d061	raft: broadcast_tables: add returning query result Intermediate language added new layer of abstraction between cql statement and quering mutations, thus this commit adds new layer of abstraction between mutations and returning query result. Result can't be directly returned from `group0_state_machine::apply`, so we decided to hold query results in map inside `raft_group0_client`. It can be safely read after `add_entry_unguarded`, because this method waits for applying raft command. After translating result to `result_message` or in case of exception, map entry is erased.	2022-09-08 15:25:36 +02:00
Mikołaj Grzebieluch	db88525774	raft: broadcast_tables: add execution of intermediate language Extended `group0_command` to enable transmission of `raft::broadcast_tables::query`. Added `add_entry_unguarded` method in `raft_group0_client` for dispatching raft commands without `group0_guard`. Queries on group0_kv_store are executed in `group_0_state_machine::apply`, but for now don't return results. They don't use previous state id, so they will block concurrent schema changes, but these changes won't block queries. In this version snapshots are ignored.	2022-09-08 15:25:36 +02:00
Mikołaj Grzebieluch	82df8a9905	raft: broadcast_tables: add compilation of cql to intermediate language We decided to extend `cql_statement` hierarchy with `strongly_consistent_modification_statement` and `strongly_consistent_select_statement`. Statements operating on system.broadcast_kv_store will be compiled to these new subclasses if BROADCAST_TABLES flag is enabled. If the query is executed on a shard other than 0 it's bounced to that shard.	2022-09-08 15:25:36 +02:00
Wojciech Mitros	7effd4c53a	wasm: directly handle recycling of invalidated instance An instance may be invalidated before we try to recycle it. We perform this by setting its value to a nullopt. This patch adds a check for it when calculating its size. This behavior didn't cause issues before because the catch clause below caught errors caused by calling value() on a nullopt, even though it was intended for errors from get_instance_size. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com> Closes #11500	2022-09-08 15:39:28 +03:00
Geoffrey Beausire	f435276d2e	Merge tokens for everywhere_topology With EverywhereStrategy, we know that all tokens will be on the same node and the data is typically sparse like LocalStrategy. Result of testing the feature: Cluster: 2 DC, 2 nodes in each DC, 256 tokens per nodes, 14 shards per node Before: 154 scanning operations After: 14 scanning operations (~10x improvement) On bigger cluster, it will probably be even more efficient. Closes #11403	2022-09-08 15:33:23 +03:00
Mikołaj Grzebieluch	c541d5c363	raft: broadcast_tables: add definition of intermediate language In broadcast tables, raft command contains a whole program to be executed. Sending and parsing on each node entire CQL statement is inefficient, thus we decided to compile it to an intermediate language which can be easily serializable. This patch adds a definition of such a language. For now, only the following types of statements can be compiled: * select value where key = CONST from system.broadcast_kv_store; * update system.broadcast_kv_store set value = CONST where key = CONST; * update system.broadcast_kv_store set value = CONST where key = CONST if value = CONST; where CONST is string literal.	2022-09-08 14:03:51 +02:00
Michał Chojnowski	0c54e7c5c7	sstables: index_reader: remove a stray vsprintf call from the hot path sstable::get_filename() constructs the filename from components, which takes some work. It happens to be called on every index_reader::index_reader() call even though it's only used for TRACE logs. That's 1700 instructions (~1% of a full query) wasted on every SSTable read. Fix that. Closes #11485	2022-09-08 14:29:23 +03:00
Michał Chojnowski	c61b901828	utils: logalloc: prefer memory::free_memory() to memory::stats().free_memory The former is a shortcut that does not involve a copy of all stats. This saves some instructions in the hot path. Closes #11495	2022-09-08 14:12:20 +03:00
Botond Dénes	438aaf0b85	Merge 'Deglobalize repair history maps' from Benny Halevy Change `a8ad385ecd` introduced ``` thread_local std::unordered_map<utils::UUID, seastar::lw_shared_ptr<repair_history_map>> repair_history_maps; ``` We're trying to avoid global scoped variables as much as we can so this should probably be embedded in some sharded service. This series moves the thread-local `repair_history_maps` instances to `compaction_manager` and passes a reference to the shard compaction_manager to functions that need it for compact_for_query and compact_for_compaction. Since some paths don't need it and don't have access to the compactio_manager, the series introduced `utils::optional_reference<T>` that allows to pass nullopt. In this case, `get_gc_before_for_key` behaves in `tombstone_gc_mode::repair` as if the table wasn't repaired and tombstones are not garbage-collected. Fixes #11208 Closes #11366 * github.com:scylladb/scylladb: tombstone_gc: deglobalize repair_history_maps mutation_compactor: pass tombstone_gc_state to compact_mutation_state mutation_partition: compact_for_compaction_v2: get tombstone_gc_state mutation_partition: compact_for_compaction: get tombstone_gc_state mutation_readers: pass tombstone_gc_state to compating_reader sstables: get_gc_before_*: get tombstone_gc_state from caller compaction: table_state: add virtual get_tombstone_gc_state method db: view: get_tombstone_gc_state from compaction_manager db: view: pass base table to view_update_builder repair: row_level: repair_update_system_table_handler: get get_tombstone_gc_state for db compaction_manager replica: database: get_tombstone_gc_state from compaction_manager compaction_manager: add tombstone_gc_state replica: table: add get_compaction_manager function tombstone_gc: introduce tombstone_gc_state repair_service: simplify update_repair_time error handling tombstone_gc: update_repair_time: get table_id rather than schema_ptr tombstone_gc: delete unused forward declaration database: do not drop_repair_history_map_for_table in detach_column_family	2022-09-08 14:08:38 +03:00
Botond Dénes	9d1cc5e616	Merge 'doc: update the OS support for versions 2022.1 and 2022.2' from Anna Stuchlik The scope of this PR: - Removing support for Ubuntu 16.04 and Debian 9. - Adding support for Debian 11. Closes #11461 * github.com:scylladb/scylladb: doc: remove support for Debian 9 from versions 2022.1 and 2022.2 doc: remove support for Ubuntu 16.04 from versions 2022.1 and 2022.2 doc: add support for Debian 11 to versions 2022.1 and 2022.2	2022-09-08 13:27:47 +03:00
Anna Stuchlik	0dee507c48	doc: fix the upgrade version in the upgrade guide for RHEL and CentOS Closes #11477	2022-09-08 13:26:49 +03:00
Alejo Sanchez	4190c61dbf	test.py: enable log capture for Python suite Enable pytest log capture for Python suite. This will help debugging issues in remote machines. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-09-08 11:37:32 +02:00
Alejo Sanchez	c6a048827a	test.py: log driver name/version for cql/topology Log the python driver name and version to help debugging on third party machines. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-09-08 11:37:32 +02:00
Alejo Sanchez	a2dd64f68f	test.py: remove top level conftest.py Remove top level conftest so different suites have their own (as it was before). Move minimal functionality into existing test/pylib/cql_repl/conftest.py so cql tests can run on their own. Move param setting into test/topology/conftest.py. Use uuid module for unique keyspace name for cql tests. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-09-08 11:37:32 +02:00
Alejo Sanchez	d892d194fb	test.py: remove spurious after test check Before/after test checks are done per test case, there's no longer need to check after pytest finishes. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #11489	2022-09-08 11:33:37 +02:00
Felipe Mendes	7ccf8ed221	cql3 - create/alter_table_statement: Make check_restricted_table_properties accept a schema_ptr As check_restricted_table_properties() is invoked both within CREATE TABLE and ALTER TABLE CQL statements, we currently have no way to determine whether the operation was either a CREATE or ALTER. In many situations, it is important to be able to distinguish among both operations, such as - for example - whether a table already has a particular property set or if we are defining it within the statement. This patch simply adds a std::optional<schema_ptr> to check_restricted_table_properties() and updates its caller. Whenever a CREATE TABLE statement is issued, the method is called as a std::nullopt, whereas if an ALTER TABLE is issued instead, we call it with a schema_ptr.	2022-09-07 21:27:32 -03:00
Kamil Braun	ff4430d8ea	test: topology: make imports friendlier for tools (such as `mypy`) When importing from `pylib`, don't modify `sys.path` but use the fact that both `test/` and `test/pylib/` directories contain an `__init__.py` file, so `test.pylib` is a valid module if we start with `test/` as the Python package root. Both `pytest` and `mypy` (and I guess other tools) understand this setup. Also add an `__init__.py` to `test/topology/` so other modules under the `test/` directory will be able to import stuff from `test/topology/` (i.e. from `test.topology.X import Y`). Closes #11467	2022-09-07 23:52:50 +03:00
Karol Baryła	1c2eef384d	transport/server.cc: Return correct size of decompressed lz4 buffer An incorrect size is returned from the function, which could lead to crashes or undefined behavior. Fix by erroring out in these cases. Fixes #11476	2022-09-07 10:58:23 +03:00
Nadav Har'El	e5f6adf46c	test/alternator: improve tests for DescribeTable for indexes I created new issues for each missing field in DescribeTable's response for GSIs and LSIs, so in this patch we edit the xfail messages in the test to refer to these issues. Additionally, we only had a test for these fields for GSIs, so this patch also adds a similar test for LSIs. I turns out there is a difference between the two tests - the two fields IndexStatus and ProvisionedThroughput are returned for GSIs, but not for LSIs. Refs #7750 Refs #11466 Refs #11470 Refs #11471 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11473	2022-09-07 09:50:16 +02:00
Benny Halevy	e9cfe9e572	tombstone_gc: deglobalize repair_history_maps Move the thread-local instances of the per-table repair history maps into compaction_manager. Fixes #11208 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:15 +03:00
Benny Halevy	8b38893895	mutation_compactor: pass tombstone_gc_state to compact_mutation_state Used in get_gc_before. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:15 +03:00
Benny Halevy	d86810d22c	mutation_partition: compact_for_compaction_v2: get tombstone_gc_state To be passed down to compact_mutation_state in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:15 +03:00
Benny Halevy	0627667a06	mutation_partition: compact_for_compaction: get tombstone_gc_state And pass down to `do_compact`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:15 +03:00
Benny Halevy	7e4612d3aa	mutation_readers: pass tombstone_gc_state to compating_reader To be passed further done to `compact_mutation_state` in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-07 07:43:14 +03:00
Benny Halevy	572d534d0d	sstables: get_gc_before_: get tombstone_gc_state from caller Pass the tombstone_gc_state from the compaction_strategy to sstables get_gc_before_ functions using the table state to get to the tombstone_gc_state. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:05:39 +03:00
Benny Halevy	2cd3fc2f36	compaction: table_state: add virtual get_tombstone_gc_state method and override it in table::table_state to get the tombstone_gc_state from the table's compaction_manager. It is going to be used in the next patched to pass the gc state from the compaction_strategy down to sstables and compaction. table_state_for_test was modified to just keep a null tombstone_gc_state. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:05:39 +03:00
Benny Halevy	6fb4b5555d	db: view: get_tombstone_gc_state from compaction_manager Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:05:39 +03:00
Benny Halevy	71ede6124a	db: view: pass base table to view_update_builder To be used by generate_update() for getting the tombstone_gc_state via the table's compaction_manager. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:04:23 +03:00
Benny Halevy	6a11c410fd	repair: row_level: repair_update_system_table_handler: get get_tombstone_gc_state for db compaction_manager Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:04:16 +03:00
Benny Halevy	3b0147390b	replica: database: get_tombstone_gc_state from compaction_manager Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:02:54 +03:00
Benny Halevy	8b841e1207	compaction_manager: add tombstone_gc_state Add a tombstone_gc_state member and methods to get it. Currently the tombstone_gc_state is default constructed, but a following patch will move the thread-local repair history maps into the compaction_manager as a member and then the _tombstone_gc_state member will be initialized from that member. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:02:54 +03:00
Benny Halevy	1ce50439af	replica: table: add get_compaction_manager function so to let a view get the tombstone_gc_state via the compaction_manager of the base table. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:02:54 +03:00
Benny Halevy	5dd15aa3c8	tombstone_gc: introduce tombstone_gc_state and use it to access the repair history maps. At this introductory patch, we use default-constructed tombstone_gc_state to access the thread-local maps temporarily and those use sites will be replaced in following patches that will gradually pass the tombstone_gc_state down from the compaction_manager to where it's used. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 23:02:54 +03:00
Benny Halevy	b2b211568e	repair_service: simplify update_repair_time error handling There's no need for per-shard try/catch here. Just catch exceptions from the overall sharded operation to update_repair_time. Also, update warning to indicate that only updating the repair history time failed, not "Loading repair history". Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 22:43:08 +03:00
Benny Halevy	7d13811297	tombstone_gc: update_repair_time: get table_id rather than schema_ptr The function doesn't need access to the whole schema. The table_id is just enough to get by. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 22:43:08 +03:00
Benny Halevy	442d43181c	tombstone_gc: delete unused forward declaration Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 22:43:08 +03:00
Benny Halevy	3d88fe9729	database: do not drop_repair_history_map_for_table in detach_column_family drop_repair_history_map_for_table is called on each shard when database::truncate is done, and the table is stopped. dropping it before the table is stopped is too early. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-06 22:43:08 +03:00
Nadav Har'El	ee606a5d52	Merge 'doc: fix the CQL version in the Interfaces table' from Anna Stuchlik Fix https://github.com/scylladb/scylla-doc-issues/issues/816 Fix https://github.com/scylladb/scylla-docs/issues/1613 This PR fixes the CQL version in the Interfaces page, so that it is the same as in other places across the docs and in sync with the version reported by the ScyllaDB (see https://github.com/scylladb/scylla-doc-issues/issues/816#issuecomment-1173878487). To make sure the same CQL version is used across the docs, we should use the `\|cql-version\| `variable rather than hardcode the version number on several pages. The variable is specified in the conf.py file: ``` rst_prolog = """ .. \|cql-version\| replace:: 3.3.1 """ ``` Closes #11320 * github.com:scylladb/scylladb: doc: add the Cassandra version on which the tools are based doc: fix the version number doc: update the Enterprise version where the ME format was introduced doc: add the ME format to the Cassandar Compatibility page doc: replace Scylla with ScyllaDB doc: rewrite the Interfaces table to the new format to include more information about CQL support doc: remove the CQL version from pages other than Cassandra compatibility doc: fix the CQL version in the Interfaces table	2022-09-06 19:02:44 +03:00
Asias He	792a91b1fa	storage_service: Drop ignore dead nodes option for bootstrap and decommission in log The ignore dead node options are not really supported at the moment. Drop it in the log to reduce confusion. Closes #11426	2022-09-06 18:21:21 +03:00
Anna Stuchlik	4c7aa5181e	doc: remove support for Debian 9 from versions 2022.1 and 2022.2	2022-09-06 14:04:22 +02:00
Anna Stuchlik	dfc7203139	doc: remove support for Ubuntu 16.04 from versions 2022.1 and 2022.2	2022-09-06 14:01:35 +02:00
Anna Stuchlik	dd4979ffa8	doc: add support for Debian 11 to versions 2022.1 and 2022.2	2022-09-06 13:54:08 +02:00
Pavel Emelyanov	398e9f8593	network_topology_strategy_test: Use topology instead of snitch Most of the test's cases use rack-inferring snitch driver and get DC/RACK from it via the test_dc_rack() helper. The helper was introduced in one of the previous sets to populate token metadata with some DC/RACK as normal tokens manipulations required respective endpoint in topology. This patch removes the usage of global snitch and replaces it with the pre-populated topology. The pre-population is done in rack-inferring snitch like manner, since token_metadata still uses global snitch and the locations from snitch and this temporary topology should match. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-06 12:26:30 +03:00
Pavel Emelyanov	d8b2940cd8	network_topology_strategy_test: Populate explicit topology There's a test case that makes its own snitch driver that generates pre-claculated DC/RACK data for test endpoints. This patch replaces this custom snitch driver with a standalone topology object. Note: to get DC/RACK info from this topo the get_location() is used since the get_rack()/get_datacenter() are still wrappers around global snitch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-06 12:24:39 +03:00
Botond Dénes	b89b84ad3c	compaction: scrub/abort: be more verbose Currently abort-mode scrub exits with a message which basically says "some problem was found", with no details on what problem it found. Add a detailed error report on the found problem before aborting the scrub. Closes #11418	2022-09-06 11:42:34 +03:00
Avi Kivity	3dc39474ec	Merge 'tools/scylla-types: add tokenof and shardof actions' from Botond Dénes `tokenof` calculates and prints the token of a partition-key. `shardof` calculates the token and finds the owner shard of a partition-key. The number of shards has to be provided by the `--sharads` parameter. Ignore msb bits param can be tweaked with the `--ignore-msb-bits` parameter, which defaults to 12. Examples: ``` $ scylla types tokenof --full-compound -t UTF8Type -t SimpleDateType -t UUIDType 000d66696c655f696e7374616e63650004800049190010c61a3321045941c38e5675255feb0196 (file_instance, 2021-03-27, c61a3321-0459-41c3-8e56-75255feb0196): -5043005771368701888 $ scylla types shardof --full-compound -t UTF8Type -t SimpleDateType -t UUIDType --shards=7 000d66696c655f696e7374616e63650004800049190010c61a3321045941c38e5675255feb0196 (file_instance, 2021-03-27, c61a3321-0459-41c3-8e56-75255feb0196): token: -5043005771368701888, shard: 1 ``` Closes #11436 * github.com:scylladb/scylladb: tools/scylla-types: add shardof action tools/scylla-types: pass variable_map to action handlers tools/scylla-types: add tokenof action tools/scylla-types: extract printing code into functions	2022-09-06 11:25:54 +03:00
Pavel Emelyanov	42c9f35374	topology: Mark compare_endpoints() arguments as const Continuation to `debfcc0e` (snitch: Move sort_by_proximity() to topology). The passed addresses are not modified by the helper. They are not yet const because the method was copy-n-pasted from snitch where it wasn't such. tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220906074708.29574-1-xemul@scylladb.com>	2022-09-06 11:03:13 +03:00
Yaron Kaikov	4459cecfd6	Docs: fix wrong manifest file for enterprise releases In https://docs.scylladb.com/stable/upgrade/upgrade-enterprise/upgrade-guide-from-2021.1-to-2022.1/upgrade-guide-from-2022.1-to-2022.1-image.html, manifest file location is pointing the wrong filename for enterprise Fixing Closes #11446	2022-09-06 06:28:16 +03:00
Avi Kivity	ae4b2ee583	locator: token_metadata: drop unused and dangerous accessors The mutable get_datacenter_endpoints() and get_datacenter_racks() are dangerous since they expose internal members without enforcing class invariants. Fortunately they are unused, so delete them. Closes #11454	2022-09-06 06:08:02 +03:00
Avi Kivity	3f8cb608c3	Merge "Move auxiliary topology sorters from snitch" from Pavel E " There are two helpers on snitch that manipulate lists of nodes taking their dc/rack into account. This set moves these methods from snitch to topology and storage proxy. " * 'br-snitch-move-proximity-sorters' of https://github.com/xemul/scylla: snitch: Move sort_by_proximity() to topology topology: Add "enable proximity sorting" bit code: Call sort_endpoints_by_proximity() via topology snitch, code: Remove get_sorted_list_by_proximity() snitch: Move is_worth_merging_for_range_query to proxy	2022-09-05 17:25:08 +03:00
Anna Stuchlik	b0ebf0902c	doc: add the Cassandra version on which the tools are based	2022-09-05 14:45:15 +02:00
Pavel Emelyanov	debfcc0eff	snitch: Move sort_by_proximity() to topology Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-05 15:17:04 +03:00
Pavel Emelyanov	41973c5bf7	topology: Add "enable proximity sorting" bit There's one corner case in nodes sorting by snitch. The simple snitch code overloads the call and doesn't sort anything. The same behavior should be preserved by (future) topology implementation, but it doesn't know the snitch name. To address that the patch adds a boolean switch on topology that's turned off by main code when it sees the snitch is "simple" one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-05 15:15:07 +03:00
Pavel Emelyanov	b6fdea9a79	code: Call sort_endpoints_by_proximity() via topology The method is about to be moved from snitch to topology, this patch prepares the rest of the code to use the latter to call it. The topology's method just calls snitch, but it's going to change in the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-05 15:14:01 +03:00
Pavel Emelyanov	4184091f1c	snitch, code: Remove get_sorted_list_by_proximity() There are two sorting methods in snitch -- one sorts the list of addresses in place, the other one creates a sorted copy of the passed const list (in fact -- the passed reference is not const, but it's not modified by the method). However, both callers of the latter anyway create their own temporary list of address, so they don't really benefit from snitch generating another copy. So this patch leaves just one sorting method -- the in-place one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-05 15:11:37 +03:00
Pavel Emelyanov	642e50f3e3	snitch: Move is_worth_merging_for_range_query to proxy Proxy is the only place that calls this method. Also the method name suggests it's not something "generic", but rather an internal logic of proxy's query processing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-05 15:10:46 +03:00
Anna Stuchlik	7294fce065	doc: simplify the information about default formats in different versions	2022-09-05 11:36:24 +02:00
Avi Kivity	3ef8d616f6	Merge 'Fix wrong commit on scylla_raid_setup: prevent mount failed for /var/lib/scylla(#11399 )' from Takuya ASADA On #11399, I mistakenly committed bug fix of first patch (`40134ef`) to second one (`8835a34`). So the script will broken when `40134ef` only, it's not looks good when we backport it to older version. Let's revert commits and make them single commit. Closes #11448 * github.com:scylladb/scylladb: scylla_raid_setup: prevent mount failed for /var/lib/scylla Revert "scylla_raid_setup: check uuid and device path are valid" Revert "scylla_raid_setup: prevent mount failed for /var/lib/scylla"	2022-09-05 12:16:10 +03:00
Mikołaj Grzebieluch	726658f073	db: system_keyspace: add broadcast_kv_store table First implementation of strongly consistent everywhere tables operates on simple table representing string to string map. Add hard-coded schema for broadcast_kv_store table (key text primary key, value text). This table is under system keyspace and is created if and only if BROADCAST_TABLES feature is enabled.	2022-09-05 11:11:08 +02:00
Mikołaj Grzebieluch	5b1421cc33	db: config: add BROADCAST_TABLES feature flag Add experimental flag 'broadcast-tables' for enabling BROADCAST_TABLES feature. This feature requires raft group0, thus enabling it without RAFT will cause an error.	2022-09-05 11:11:08 +02:00
Avi Kivity	e3cdc8c4d3	Update tools/java submodule (python3 dependency) * tools/java 6995a83cc1...b7a0c5bd31 (1): > dist/debian:add python3 as dependency	2022-09-05 12:08:24 +03:00
Takuya ASADA	d676c22f09	scylla_raid_setup: prevent mount failed for /var/lib/scylla Just like `4a8ed4c`, we also need to wait for udev event completion to create /dev/disk/by-uuid/$UUID for newly formatted disk, to mount the disk just after formatting. Also added code to check make sure uuid and uuid based device path are valid. Fixes #11359 Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2022-09-05 17:52:49 +09:00
Takuya ASADA	ede7da366b	Revert "scylla_raid_setup: check uuid and device path are valid" This reverts commit `40134efee4`.	2022-09-05 17:52:42 +09:00
Takuya ASADA	841c686301	Revert "scylla_raid_setup: prevent mount failed for /var/lib/scylla" This reverts commit `8835a34ab6`.	2022-09-05 17:52:41 +09:00
Piotr Sarna	2379a25ade	alternator: propagate authenticated user in client state From now on, when an alternator user correctly passed an authentication step, their assigned client_state will have that information, which also means proper access to service level configuration. Previously the username was only used in tracing.	2022-09-05 10:43:29 +02:00
Anna Stuchlik	39e6002fc8	doc: fix the version number	2022-09-05 10:04:34 +02:00
Piotr Sarna	66f7ab666f	client_state: add internal constructor with auth_service The constructor can be used as backdoor from frontends other than CQL to create a session with an authenticated user, with access to its attached service level information.	2022-09-05 10:03:00 +02:00
Piotr Sarna	9511c21686	alternator: pass auth_service and sl_controller to server It's going to be needed to recreate a client state for an authenticated user.	2022-09-05 10:03:00 +02:00
Anna Stuchlik	81f32899d0	doc: update the Enterprise version where the ME format was introduced	2022-09-05 10:02:57 +02:00
Botond Dénes	f8b38cbe09	Merge 'doc: add support for Ubuntu 22.04 in ScyllaDB Enterprise' from Anna Stuchlik Fix https://github.com/scylladb/scylladb/issues/11430 @tzach I've added support for Ubuntu 22.04 to the row for version 2022.2. Does that version support Debian 11? That information is also missing (it was only added to OSS 5.0 and 5.1). Closes #11437 * github.com:scylladb/scylladb: doc: add support for Ubuntu 22.04 to the Enterprise table doc: rename the columns in the Enterpise section to be in sync with the OSS section	2022-09-05 06:42:55 +03:00
Anna Stuchlik	41b91e3632	doc: fix the architecture type on the upgrade page Closes #11438	2022-09-05 06:30:51 +03:00
Botond Dénes	21ef0c64f1	tools/scylla-types: add shardof action Decorates a partition key and calculates which shard it belongs to, given the shard count (--shards) and the ignore msb bits (--ignore-msb-bits) parameters. The latter is optional and is defaulted to 12. Example: $ scylla types shardof --full-compound -t UTF8Type -t SimpleDateType -t UUIDType --shards=7 000d66696c655f696e7374616e63650004800049190010c61a3321045941c38e5675255feb0196 (file_instance, 2021-03-27, c61a3321-0459-41c3-8e56-75255feb0196): token: -5043005771368701888, shard: 1	2022-09-05 06:22:57 +03:00
Botond Dénes	4333d33f01	tools/scylla-types: pass variable_map to action handlers Allowing them to have get the value of extra command line parameters.	2022-09-05 06:22:55 +03:00
Botond Dénes	58d4f22679	tools/scylla-types: add tokenof action Calculate and print the token of a partition-key. Example: $ scylla types tokenof --full-compound -t UTF8Type -t SimpleDateType -t UUIDType 000d66696c655f696e7374616e63650004800049190010c61a3321045941c38e5675255feb0196 (file_instance, 2021-03-27, c61a3321-0459-41c3-8e56-75255feb0196): -5043005771368701888	2022-09-05 06:20:10 +03:00
Botond Dénes	be9d1c4df4	sstables: crawling mx-reader: make on_out_of_clustering_range() no-op Said method currently emits a partition-end. This method is only called when the last fragment in the stream is a range tombstone change with a position after all clustered rows. The problem is that consume_partition_end() is also called unconditionally, resulting in two partition-end fragments being emitted. The fix is simple: make this method a no-op, there is nothing to do there. Also add two tests: one targeted to this bug and another one testing the crawling reader with random mutations generated for random schema. Fixes: #11421 Closes #11422	2022-09-04 20:02:50 +03:00
Botond Dénes	3e69fe0fe7	scylla-gdb.py: scylla repairs: print only address of repair_meta Instead of the entire object. Repair meta is a large object, its printout floods the output of the command. Print only its address, the user can print the objects it is interested in. Closes #11428	2022-09-04 19:58:42 +03:00
Yaron Kaikov	9f9ee8a812	build_docker.sh: Build docker based on Ubuntu:22.04 Ubuntu 20.04 has less than 3 years of OS support remaining. We should switch to Ubuntu 22.04 to reduce the need for OS upgrades in newly installed clusters. Closes #11440	2022-09-04 14:00:27 +03:00
Avi Kivity	61769d3b21	Merge "Make messaging service use topology for DC/RACK" from Pavel E " Messaging needs to know DC/RACK for nodes to decide whether it needs to do encryption or compression depending on the options. As all the other services did it still uses snitch to get it, but simple switch to use topology needs extra care. The thing is that messaging can use internal IP instead of endpoints. Currently it's snitch who tries har^w somehow to resolve this, in particular -- if the DC/RACK is not found for the given argument it assumes that it might be internal IP and calls back messaging to convert it to the endpoint. However, messaging does know when it uses which address and can do this conversion itself. So this set eliminates few more global snitch usages and drops the knot tieing snitch, gossiper and messaging with each-other. " * 'br-messaging-use-topology-1.2' of https://github.com/xemul/scylla: messaging: Get DC/RACK from topology messaging, topology: Keep shared_token_metadata* on messaging messaging: Add is_same_{dc\|rack} helpers snitch, messaging: Dont relookup dc/rack on internal IP	2022-09-04 13:54:34 +03:00
Pavel Emelyanov	6dedc69608	topology: Do not add bootstrapping nodes to topology Recent change in topology (commit `4cbe6ee9` titled "topology: Require entry in the map for update_normal_tokens()") made token_metadata::update_normal_tokens() require the entry presense in the embedded topology object. Respectively, the commit in question equipped most callers of update_normal_tokens() with preceeding topology update call to satisfy the requirement. However, tokens are put into token_metadata not only for normal state, but also for bootstrapping, and one place that added bootstrapping tokens errorneously got topology update. This is wrong -- node must not be present in the topology until switching into normal state. As the result several tests with bootstrapping nodes started to fail. The fix removes topology update for bootstrapping nodes, but this change reveals few other places that piggy-backed this mistaken update, so noy _they_ need to update topology themselves. tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/2040/ update_cluster_layout_tests.py::test_simple_add_new_node_while_schema_changes_with_repair update_cluster_layout_tests.py::test_simple_kill_new_node_while_bootstrapping_with_parallel_writes_in_multidc repair_based_node_operations_test.py::test_lcs_reshape_efficiency Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220902082753.17827-1-xemul@scylladb.com>	2022-09-04 13:53:38 +03:00
Avi Kivity	16a3e55aa1	Update seastar submodule * seastar f2d70c4a17...2b2f6c080e (4): > perftune.py: special case a former 'MQ' mode in the new auto-detection code > iostream: Generalize flush and batched flush > Merge "Equip sharded<>::invoke_on_all with unwrap_sharded_args" from Pavel E > Merge "perftune.py: cosmetic fixes" from VladZ Closes #11434	2022-09-04 10:19:48 +03:00
Anna Stuchlik	5d09e1a912	doc: add the ME format to the Cassandar Compatibility page	2022-09-02 15:12:30 +02:00
Anna Stuchlik	dfb7a221db	doc: update the SSTables 3.0 Statistics File Format to add the UUID host_id option of the ME format	2022-09-02 14:55:11 +02:00
Anna Stuchlik	f1184e1470	doc: add the information regarding the ME format to the SSTables 3.0 Data File Format page	2022-09-02 14:48:58 +02:00
Anna Stuchlik	177a1d4396	doc: fix additional information regarding the ME format on the SStable 3.x page	2022-09-02 14:41:55 +02:00
Anna Stuchlik	b3eacdca75	doc: add the ME format to the table	2022-09-02 14:26:16 +02:00
Anna Stuchlik	af4d1b80d8	doc: add support for Ubuntu 22.04 to the Enterprise table	2022-09-02 12:43:04 +02:00
Anna Stuchlik	947f8769f4	doc: rename the columns in the Enterpise section to be in sync with the OSS section	2022-09-02 12:31:57 +02:00
Pavel Emelyanov	f0580aedaf	messaging: Get DC/RACK from topology Now everything is prepared for that Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-02 11:34:57 +03:00
Botond Dénes	be70fcf587	tools/scylla-types: extract printing code into functions To make the individual overloads on the exact type usable on their own.	2022-09-02 07:46:18 +03:00
Botond Dénes	2c46c24608	Merge 'doc: change the tool names to "Scylla SStable" and "Scylla Types"' from Anna Stuchlik Fix https://github.com/scylladb/scylladb/issues/11393 - Rename the tool names across the docs. - Update the examples to replace `scylla-sstable` and `scylla-types` with `scylla sstable` and `scylla types`, respectively. Closes #11432 * github.com:scylladb/scylladb: doc: update the tool names in the toctree and reference pages doc: rename the scylla-types tool as Scylla Types doc: rename the scylla-sstable tool as Scylla SStable	2022-09-01 16:32:18 +03:00
Anna Stuchlik	18da200669	doc: update the tool names in the toctree and reference pages	2022-09-01 15:09:12 +02:00
Anna Stuchlik	c255399f27	doc: rename the scylla-types tool as Scylla Types	2022-09-01 15:05:44 +02:00
Anna Stuchlik	d0cb24feaa	doc: rename the scylla-sstable tool as Scylla SStable	2022-09-01 14:45:19 +02:00
Anna Stuchlik	1834d5d121	add a comment to remove the information when the documentation is versioned (in 5.1)	2022-09-01 12:57:15 +02:00
Anna Stuchlik	476107912c	doc: replace Scylla with ScyllaDB	2022-09-01 12:52:58 +02:00
Anna Stuchlik	8aae8a3cef	doc: fix the formatting and language in the updated section	2022-09-01 12:50:04 +02:00
Anna Stuchlik	ff4ae879cb	doc: fix the default SStable format	2022-09-01 12:47:11 +02:00
Pavel Emelyanov	e147681d85	messaging, topology: Keep shared_token_metadata* on messaging Messaging will need to call topology methods to compare DC/RACK of peers with local node. Topology now resides on token metadata, so messaging needs to get the dependency reference. However, messaging only needs the topology when it's up and running, so instead of producing a life-time reference, add a pointer, that's set up on .start_listen(), before any client pops up, and is cleared on .shutdown() after all connections are dropped. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-01 11:32:34 +03:00
Pavel Emelyanov	551c51b5bf	messaging: Add is_same_{dc\|rack} helpers For convenience of future patching Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-01 11:32:34 +03:00
Pavel Emelyanov	c08c370c2c	snitch, messaging: Dont relookup dc/rack on internal IP When getting dc/rack snitch may perform two lookups -- first time it does it using the provided IP, if nothing is found snitch assumes that the IP is internal one, gets the corresponding public one and searches again. The thing is that the only code that may come to snitch with internal IP is the messaging service. It does so in two places: when it tries to connect to the given endpoing and when it accepts a connection. In the former case messaging performs public->internal IP conversion itself and goes to snitch with the internal IP value. This place can get simpler by just feeding the public IP to snich, and converting it to the internal only to initiate the connection. In the latter case the accepted IP can be either, but messaging service has the public<->private map onboard and can do the conversion itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-01 11:32:34 +03:00
Avi Kivity	fe401f14de	Merge 'scylla_raid_setup: prevent mount failed for /var/lib/scylla' from Takuya ASADA Just like `4a8ed4cc6f`, we also need to wait for udev event completion to create /dev/disk/by-uuid/$UUID for newly formatted disk, to mount the disk just after formatting. Also added code to check make sure uuid and uuid based device path are valid. Fixes #11359 Closes #11399 * github.com:scylladb/scylladb: scylla_raid_setup: prevent mount failed for /var/lib/scylla scylla_raid_setup: check uuid and device path are valid	2022-08-31 19:59:38 +03:00
Kefu Chai	a5e696fab8	storage_service, test: drop unused storage_service_config this setting was removed back in `dcdd207349`, so despite that we are still passing `storage_service_config` to the ctor of `storage_service`, `storage_service::storage_service()` just drops it on the floor. in this change, `storage_service_config` class is removed, and all places referencing it are updated accordingly. Signed-off-by: Kefu Chai <tchaikov@gmail.com> Closes #11415	2022-08-31 19:49:13 +03:00
Botond Dénes	2a3012db7f	docs/README.md: expand prerequisites list poetry and make was missing from the list. Closes #11391	2022-08-31 17:00:59 +03:00
Botond Dénes	cb98d4f5da	docs: admin-tools: remove defunct sstable-index Said tool was supplanted by scylla-sstable in 4.6. Remove the page as well as all references to it. Closes #11392	2022-08-31 17:00:04 +03:00
Avi Kivity	a9a230afbe	scripts: introduce script to apply email, working around google groups brokeness Google Groups recently started rewriting the From: header, garbaging our git log. This script rewrites it back, using the Reply-To header as a still working source. Closes #11416	2022-08-31 14:47:24 +03:00
Botond Dénes	b9fc504fb2	Merge 'doc: cql-extensions.md: improve description of synchronous views' from Nadav Har'El It was pointed out to me that our description of the synchronous_updates materialized-view option does not make it clear enough what is the default setting, or why a user might want to use this option. This patch changes the description to (I hope) better address these issues. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11404 * github.com:scylladb/scylladb: doc: cql-extensions.md: replace "Scylla" by "ScyllaDB" doc: cql-extensions.md: improve description of synchronous views	2022-08-31 14:33:39 +03:00
Avi Kivity	2ab5cbd841	Merge 'Docs: document how scylla-sstable obtains its schema' from Botond Dénes This is a very important aspect of the tool that was completely missing from the document before. Also add a comparison with SStableDump. Fixes: https://github.com/scylladb/scylladb/issues/11363 Closes #11390 * github.com:scylladb/scylladb: docs: scylla-sstable.rst: add comparison with SStableDump docs: scylla-sstable.rst: add section about providing the schema	2022-08-31 14:28:52 +03:00
Anna Stuchlik	72b77b8c78	doc: add a comment to remove the note in version 5.1	2022-08-31 12:49:10 +02:00
Anna Stuchlik	b4bbd1fd53	doc: update the information on the Countng all rows page and add the recommendation to upgrade ScyllaDB	2022-08-31 12:39:05 +02:00
Nadav Har'El	ad0f6158c4	doc: cql-extensions.md: replace "Scylla" by "ScyllaDB" It was recently decided that the database should be referred to as "ScyllaDB", not "Scylla". This patch changes existing references in docs/cql/cql-extensions.md. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-31 13:23:24 +03:00
Nadav Har'El	ec8e98e403	doc: cql-extensions.md: improve description of synchronous views It was pointed out to me that our description of the synchronous_updates materialized-view option does not make it clear enough what is the default setting, or why a user might want to use this option. This patch changes the description to (I hope) better address these issues. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-31 13:22:24 +03:00
Anna Stuchlik	180fc73695	doc: add a note to the description of COUNT with a reference to the KB article	2022-08-31 12:11:12 +02:00
Anna Stuchlik	cff849d845	doc: add COUNT to the list of acceptable selectors of the SELECT statement	2022-08-31 11:59:20 +02:00
Avi Kivity	421557b40a	Merge "Provide DC/RACK when populating topology" from Pavel E " The topology object maintains all sort of node/DC/RACK mappings on board. When new entries are added to it the DC and RACK are taken from the global snitch instance which, in turn, checks gossiper, system keyspace and its local caches. This set make topology population API require DC and RACK via the call argument. In most of the cases the populating code is the storage service that knows exactly where to get those from. After this set it will be possible to remove the dependency knot consiting of snitch, gossiper, system keyspace and messaging. " * 'br-topology-dc-rack-info' of https://github.com/xemul/scylla: toplogy: Use the provided dc/rack info test: Provide testing dc/rack infos storage_service: Provide dc/rack for snitch reconfiguration storage_service: Provide dc/rack from system ks on start storage_service: Provide dc/rack from gossiper for replacement storage_service: Provide dc/rack from gossiper for remotes storage_service,dht,repair: Provide local dc/rack from system ks system_keyspace: Cache local dc-rack on .start() topology: Some renames after previous patch topology: Require entry in the map for update_normal_tokens() topology: Make update_endpoint() accept dc-rack info replication_strategy: Accept dc-rack as get_pending_address_ranges argument dht: Carry dc-rack over boot_strapper and range_streamer storage_service: Make replacement info a real struct	2022-08-31 12:53:06 +03:00
Benny Halevy	c284c32f74	Update seastar submodule * seastar f9f5228b74...f2d70c4a17 (51): > cmake: attach property to Valgrind not to hwloc > Create the seastar_memory logger in all builds > drop unused parameters > Merge "Unify pollable_fd shutdown and abort_{reader\|writer}" from Pavel E > > pollable_fd: Replace two booleans with a mask > > pollable_fd: Remove abort_reader/_writer > Merge "Improve Rx channels assignment" from Vlad > > perftune.py: fix comments of IRQ ordering functors > > perftune.py: add VIRTIO fast path IRQs ordering functor > > perftune.py: reduce number of Rx channels to the number of IRQ CPUs > > perftune.py: introduce a --num-rx-queues parameter > program_options: enable optional selection_value > .gitignore: ignore the directories generated by VS Code and CLion. > httpd: compare the Connection header value in a case-insensitive manner. > httpd: move the logic of keepalive to a separate method. > register one default priority class for queue > Reset _total_stats before each run > log: add colored logging support > Merge "perftune.py: add NUMA aware auto-detection for big machines" from Vlad > > perftune.py: mention 'irq_cpu_mask' in the description of the script operation > > perftune.py: NetPerfTuner: fix bits counting in self.irqs_cpu_mask wider than 32 bits > > perftune.py: PerfTuneBase.cpu_mask_is_zero(cpu_mask): cosmetics: fix a comment and a variable name > > perftune.py: PerfTuneBase.cpu_mask_is_zero(cpu_mask): take into account omitted zero components of the mask > > perftune.py: PerfTuneBase.compute_cpu_mask_for_mode(): cosmetics: fix a variable name > > perftune.py: stop printing 'mode' in --dump-options-file > > perftune.py: introduce a generic auto_detect_irq_mask(cpu_mask) function > > perftune.py: DiskPerfTuner: use self.irqs_cpu_mask for tuning non-NVME disks > > perftune.py: stop auto-detecting and using 'mode' internally > > perftune.py: introduce --get-irq-cpu-mask command line parameter > > perftune.py: introduce --irq-core-auto-detection-ratio parameter > build: add a space after function name > Update HACKING.md > log: do not inherit formatter<seastar::log_level> from formatter<string_view> > Merge "Mark connected_socket::shutdown_...'s internals noexcept" from Pavel E > > native-stack: Mark tcp::in_state() (and its wrappers) const noexcept > > native-stack: Mark tcb::close and tcb::abort_reader noexcept > > native-stack: Mark tcp::connection::close_{read\|write}() noexcept > > native-stack: Mark tcb::clear_delayed_ack() and tcb::stop_retransmit_timer() noexcept > > tls: Mark session::close() noexcept > > file_desc: Add fdinfo() helper > > posix-stack: Mark posix_connected_socket_impl::shutdown_{input\|output}() noexcept > > tests: Mark loopback_buffer::shutdown() noexcept > Merge "Enhance RPC connection error injector" from Pavel E > > loopback_socket: Shuffle error injection > > loopback_socket: Extend error injection > > loopback_socket: Add one-shot errors > > loopback_socket: Add connection error injection > > rpc_test: Extend error injector with kind > > rpc_test: Inject errors on all paths > > rpc_test: Use injected connect error > > rpc_test: De-duplicate test socket creation > Merge 'tls: vec_push: handle async errors rather than throwing on_internal_error' from Benny Halevy > > tls: do_handshake: handle_output_error of gnutls_handshake > > tls: session: vec_push: return output_pending error > > tls: session: vec_push: reindent > log: disambiguate formatter<log_level> from operator<< > tls_test: Fix spurious fail in test_x509_client_with_builder_system_trust_multiple (et al) Fixes scylladb/scylladb#11252 Closes #11401	2022-08-31 12:12:48 +03:00
Botond Dénes	dca351c2a6	Merge 'doc: add the upgrade guide for ScyllaDB image from 2021.1 to 2022.1' from Anna Stuchlik This PR is related to https://github.com/scylladb/scylla-docs/issues/4124 and https://github.com/scylladb/scylla-docs/issues/4123. New Enterprise Upgrade Guide from 2021.1 to 2022.2 I've added the upgrade guide for ScyllaDB Enterprise image. In consists of 3 files: /upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian-p1.rst upgrade/_common/upgrade-image.rst /upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian-p2.rst Modified Enterprise Upgrade Guides 2021.1 to 2022.2 I've modified the existing guides for Ubuntu and Debian to use the same files as above, but exclude the image-related information: /upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian-p1.rst + /upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian-p2.rst = /upgrade/_common/upgrade-guide-v2022-ubuntu-and-debian.rst To make things simpler and remove duplication, I've replaced the guides for Ubuntu 18 and 20 with a generic Ubuntu guide. Modified Enterprise Upgrade Guides from 4.6 to 5.0 These guides included a bug: they included the image-related information (about updating OS packages), because a file that includes that information was included by mistake. What's worse, it was duplicated. After the includes were removed, image-related information is no longer included in the Ubuntu and Debian guides (this fixes https://github.com/scylladb/scylla-docs/issues/4123). I've modified the index file to be in sync with the updates. Closes #11285 * github.com:scylladb/scylladb: doc: reorganize the content to list the recommended way of upgrading the image first doc: update the image upgrade guide for ScyllaDB image to include the location of the manifest file doc: fix the upgrade guides for Ubuntu and Debian by removing image-related information doc: update the guides for Ubuntu and Debian to remove image information and the OS version number doc: add the upgrade guide for ScyllaDB image from 2021.1 to 2022.1	2022-08-31 07:24:55 +03:00
Gleb Natapov' via ScyllaDB development	0d20830863	direct_failure_detector: reduce severity of ping error logging Having an error while pinging a peer is not a critical error. The code retires and move on. Lets log the message with less severity since sometimes those error may happen (for instance during node replace operation some nodes refuse to answer to pings) and dtest complains that there are unexpected errors in the logs. Message-Id: <Ywy5e+8XVwt492Nc@scylladb.com>	2022-08-31 07:11:59 +03:00
Raphael S. Carvalho	631b2d8bdb	replica: rename table::on_compaction_completion and coroutinize it on_compaction_completion() is not very descriptive. let's rename it, following the example of update_sstable_lists_on_off_strategy_completion(). Also let's coroutinize it, so to remove the restriction of running it inside a thread only. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #11407	2022-08-31 06:17:20 +03:00
Nadav Har'El	a797512148	Merge 'Raft test topology start stopped servers' from Alecco Test teardown involves dropping the test keyspace. If there are stopped servers occasionally we would see timeouts. Start stopped servers after a test is finished (and passed). Revert previous commit making teardown async again. Closes #11412 * github.com:scylladb/scylladb: test.py: restart stopped servers before teardown... Revert "test.py: random tables make DDL queries async"	2022-08-30 22:48:47 +03:00
Pavel Emelyanov	e5e75ba43c	Merge 'scylla-gdb.py: bring scylla reads-stats up-to-date' from Botond Dénes Said command is broken since 4.6, as the type of `reader_concurrency_semaphore::_permit_list` was changed without an accompanying update to this command. This series updates said command and adds it to the list of tested commands so we notice if it breaks in the future. Closes #11389 * github.com:scylladb/scylladb: test/scylla-gdb: test scylla read-stats scylla-gdb.py: read_stats: update w.r.t. post 4.5 code scylla-gdb.py: improve string_view_printer implementation	2022-08-30 20:24:02 +03:00
Nadav Har'El	56d714b512	Merge 'Docs: Update support OS' from Tzach Livyatan This PR change the CentOS 8 support to Rocky, and add 5.1 and 2022.1, 2022.2 rows to the list of Scylla releases Closes #11383 * github.com:scylladb/scylladb: OS support page: use CentOS not Centos OS support page: add 5.1, 2022.1 and 2022.2 OS support page: Update CentOS 8 to Rocky 8	2022-08-30 18:02:44 +03:00
Anna Stuchlik	0d3285dd3c	doc: replace Scylla with ScyllaDB	2022-08-30 15:35:06 +02:00
Anna Stuchlik	ab04ed2fda	doc: rewrite the Interfaces table to the new format to include more information about CQL support	2022-08-30 15:31:41 +02:00
Anna Stuchlik	4ac5574f1d	doc: remove the CQL version from pages other than Cassandra compatibility	2022-08-30 13:58:26 +02:00
Alejo Sanchez	df1ca57fda	test.py: restart stopped servers before teardown... for topology tests Test teardown involves dropping the test keyspace. If there are stopped servers occasionally we would see timeouts. Start stopped servers after a test is finished. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-30 11:40:40 +02:00
Alejo Sanchez	e5eac22a37	Revert "test.py: random tables make DDL queries async" This reverts commit `67c91e8bcd`. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-30 10:54:33 +02:00
Anna Stuchlik	99cee1aceb	doc: reorganize the content to list the recommended way of upgrading the image first	2022-08-30 10:11:02 +02:00
Anna Stuchlik	ffe6f97c06	doc: update the image upgrade guide for ScyllaDB image to include the location of the manifest file	2022-08-30 10:01:56 +02:00
Tzach Livyatan	4e413787d2	doc: Fix nodetool flush example `nodetool flush` have a space between keyspace and table names See https://docs.scylladb.com/stable/operating-scylla/nodetool-commands/flush for the right syntax. Fixes #11314 Closes #11334	2022-08-29 15:06:38 +03:00
Nadav Har'El	eed65dfc2d	Merge 'db: schema_tables: Make table creation shadow earlier concurrent changes' from Tomasz Grabiec Issuing two CREATE TABLE statements with a different name for one of the partition key columns leads to the following assertion failure on all replicas: scylla: schema.cc:363: schema::schema(const schema::raw_schema&, std::optional<raw_view_info>): Assertion `!def.id \|\| def.id == id - column_offset(def.kind)' failed. The reason is that once the create table mutations are merged, the columns table contains two entries for the same position in the partition key tuple. If the schemas were the same, or not conflicting in a way which leads to abort, the current behavior would be to drop the older table as if the last CREATE TABLE was preceded by a DROP TABLE. The proposed fix is to make CREATE TABLE mutation include a tombstone for all older schema changes of this table, effectively overriding them. The behavior will be the same as if the schemas were not different, older table will be dropped. Fixes #11396 Closes #11398 * github.com:scylladb/scylladb: db: schema_tables: Make table creation shadow earlier concurrent changes db: schema_tables: Fix formatting db: schema_mutations: Make operator<<() print all mutations schema_mutations: Make it a monoid by defining appropriate += operator	2022-08-29 14:21:07 +03:00
Tomasz Grabiec	ae8d2a550d	db: schema_tables: Make table creation shadow earlier concurrent changes Issuing two CREATE TABLE statements with a different name for one of the partition key columns leads to the following assertion failure on all replicas: scylla: schema.cc:363: schema::schema(const schema::raw_schema&, std::optional<raw_view_info>): Assertion `!def.id \|\| def.id == id - column_offset(def.kind)' failed. The reason is that once the create table mutations are merged, the columns table contains two entries for the same position in the partition key tuple. If the schemas were the same, or not conflicting in a way which leads to abort, the current behavior would be to drop the older table as if the last CREATE TABLE was preceded by a DROP TABLE. The proposed fix is to make CREATE TABLE mutation include a tombstone for all older schema changes of this table, effectively overriding them. The behavior will be the same as if the schemas were not different, older table will be dropped. Fixes #11396	2022-08-29 12:06:02 +02:00
Benny Halevy	d588e2a7c5	release: properly evaluate SCYLLA_BUILD_MODE_* macros Patch `765d2f5e46` did not evaluate the #if SCYLLA_BUILD_MODE directives properly and it always matched SCYLLA_BULD_MODE == release. This change fixes that by defining numerical codes for each build mode and using macro expansion to match the define SCYLLA_BUILD_MODE against these codes. Also, ./configure.py was changes to pass SCYLLA_BUILD_MODE to all .cc source files, and makes sure it is defined in build_mode.hh. Support was added for coverage build mode, and an #error was added if SCYLLA_BUILD_MODE was not recognized by the #if ladder directives. Additional checks verifying the expected SEASTAR_DEBUG against SCYLLA_BUILD_MODE were added as well, Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11387	2022-08-29 10:20:19 +03:00
Botond Dénes	0f4666010a	docs: scylla-sstable.rst: add comparison with SStableDump The two tools have very similar goals, user might wonder when to use one or the other. Also add a link to sstabledump.rst to scylla-sstable.	2022-08-29 08:29:14 +03:00
Botond Dénes	65da6a26a3	docs: scylla-sstable.rst: add section about providing the schema Providing the schema for the scylla-sstable tool is an important topic that was completely missing from the description so far.	2022-08-29 08:29:09 +03:00
Alejo Sanchez	67c91e8bcd	test.py: random tables make DDL queries async There are async timeouts for ALTER queries. Seems related to othe issues with the driver and async. Make these queries synchronous for now. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #11394	2022-08-28 10:38:39 +03:00
Felipe Mendes	fd5cb85a7a	alternator - Doc - Update DescribeTable response and introduce hashing function differences This commit introduces the following changes to Alternator compability doc: * As of https://github.com/scylladb/scylladb/pull/11298 Alternator will return ProvisionedThroughput in DescribeTable API calls. We add the fact that tables will default to a BillingMode of PAY_PER_REQUEST (this wasn't made explicit anywhere in the docs), and that the values for RCUs/WCUs are hardcoded to 0. * Mention the fact that ScyllaDB (thus Alternator) hashing function is different than AWS proprietary implementation for DynamoDB. This is mostly of an implementation aspect rather than a bug, but it may cause user confusion when/if comparing the ResultSet between DynamoDB and Alternator returned from Table Scans. Refs: https://github.com/scylladb/scylladb/issues/11222 Fixes: https://github.com/scylladb/scylladb/issues/11315 Closes #11360	2022-08-28 10:29:07 +03:00
Takuya ASADA	8835a34ab6	scylla_raid_setup: prevent mount failed for /var/lib/scylla Just like `4a8ed4c`, we also need to wait for udev event completion to create /dev/disk/by-uuid/$UUID for newly formatted disk, to mount the disk just after formatting. Fixes #11359	2022-08-27 03:27:44 +09:00
Takuya ASADA	40134efee4	scylla_raid_setup: check uuid and device path are valid Added code to check make sure uuid and uuid based device path are valid.	2022-08-27 03:08:31 +09:00
Tomasz Grabiec	661db2706f	db: schema_tables: Fix formatting	2022-08-26 17:37:48 +02:00
Tomasz Grabiec	a020c4644c	db: schema_mutations: Make operator<<() print all mutations	2022-08-26 16:48:15 +02:00
Tomasz Grabiec	cf034c1891	schema_mutations: Make it a monoid by defining appropriate += operator	2022-08-26 16:48:15 +02:00
Kamil Braun	6c16ae4868	Merge 'raft, limit for command size' from Gusev Petr Commitlog imposes a limit on the size of mutations and throws an exception if it's exceeded. In case of schema changes before raft this exception was delivered to the client. Now it happens while saving the raft command in io_fiber in persistence->store_log_entries and what the client gets is just a timeout exception, which doesn't say much about the cause of the problem. This patch introduces an explicit command size limit and provides a clear error message in this case. Closes #11318 * github.com:scylladb/scylladb: raft, use max_command_size to satisfy commitlog limit raft, limit for command size	2022-08-26 12:20:58 +02:00
Pavel Emelyanov	6405aba748	toplogy: Use the provided dc/rack info Previous patches made all the callers of topology.update_endpoint() (via token_metadata.update_topology()) provide correct dc/rack info for the endpoint. It's now possible to stop using global snitch by topology and just rely on the dc/rack argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 10:02:00 +03:00
Pavel Emelyanov	10e8804417	test: Provide testing dc/rack infos There's a test that's sensitive to correct dc/rack info for testing entries. To populate them it uses global rack-inferring snitch instance or a special "testing" snitch. To make it continue working add a helper that would populate the topology properly (spoiler: next branch will replace it with explicitly populated topology object). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 10:00:04 +03:00
Pavel Emelyanov	f6abc3f759	storage_service: Provide dc/rack for snitch reconfiguration When snitch reconfigures (gossiper-property-file one) it kicks storage service so that it updates itself. This place also needs to update the dc/rack info about itself, the correct (new) values are taken from the snitch itself. There's a bug here -- system.local table it not update with new data until restart. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:58:34 +03:00
Pavel Emelyanov	f8614fe039	storage_service: Provide dc/rack from system ks on start When a node starts it loads the information about peers from system.peers table and populates token metadata and topology with this information. The dc/rack are taken from the sys-ks cache here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:57:15 +03:00
Pavel Emelyanov	5d5782a086	storage_service: Provide dc/rack from gossiper for replacement When a node it started to replace another node it updates token metadata and topology with the target information eary. The tokens are now taken from gossiper shadow round, this patch makes the same for dc/rack info. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:55:31 +03:00
Pavel Emelyanov	6b70358616	storage_service: Provide dc/rack from gossiper for remotes When a node is notified about other nodes state change it may want to update the topology information about it. In all those places the dc/rack into about the peer is provided by the gossiper. Basically, these updates mirror the relevant updates of tokens on the token metadata object. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:53:54 +03:00
Pavel Emelyanov	43e83c5415	storage_service,dht,repair: Provide local dc/rack from system ks When a node starts it adds itself to the topology. Mostly it's done in the storage_service::join_cluster() and whoever it calls. In all those places the dc/rack for the added node is taken from the system keyspace (it's cache was populated with local dc/rack by the previous patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:52:16 +03:00
Pavel Emelyanov	a03d6f7751	system_keyspace: Cache local dc-rack on .start() There's a cache of endpoint:{dc,rack} on system keyspace cache, but the local node is not there, because this data is populated from the peers table, while local node's dc/rack is in snitch (or system.local table). At the same time, storage_service::join_cluster() and whoever it calls (e.g. -- the repair) will need this info on start and it's convenient to have this data on sys-ks cache. It's not on the peers part of the cache because next branch removes this map and it's going to be very clumsy to have a whole container with just one enty in it. There's a peer code in system_keyspace::setup() that gets the local node dc/rack and committs it into the system.local table. However, putting the data into cache is done on .start(). This is because cql-test-env needs this data cached too, but it doesn't call sys_ks.setup(). Will be cleaned some other day. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:47:30 +03:00
Pavel Emelyanov	c043f6fa96	topology: Some renames after previous patch The topology::update_endpoint() is now a plain wrapper over private ::add_endpoint() method of the same class. It's simpler to merge them Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:46:26 +03:00
Pavel Emelyanov	4cbe6ee9f4	topology: Require entry in the map for update_normal_tokens() The method in question tries to be on the safest side and adds the enpoint for which it updates the tokens into the topology. From now on it's up to the caller to put the endpoint into topology in advance. So most of what this patch does is places topology.update_endpoint() into the relevant places of the code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:44:08 +03:00
Pavel Emelyanov	5fc9854eae	topology: Make update_endpoint() accept dc-rack info The method in question populates topology's internal maps with endpoint vs dc/rack relations. As for today the dc/rack values are taken from the global snitch object (which, in turn, goes to gossiper, system keyspace and its internal non-updateable cache for that). This patch prepares the ground for providing the dc/rack externally via argument. By now it's just and argument with empty strings, but next patches will populate it with real values (spoiler: in 99% it's storage service that calls this method and each call will know where to get it from for sure) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:41:09 +03:00
Pavel Emelyanov	7305061674	replication_strategy: Accept dc-rack as get_pending_address_ranges argument The method creates a copy of token metadata and pushes an endpoint (with some tokens) into it. Next patches will require providing dc/rack info together with the endpoint, this patch prepares for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:39:44 +03:00
Pavel Emelyanov	360c4f8608	dht: Carry dc-rack over boot_strapper and range_streamer Both classes may populate (temporarly clones of) token metadata object with endpoint:tokens pairs for the endpoint they work with. Next patches will require that endpoint comes with the dc/rack info. This patch makes sure dht classes have the necessary information at hand (for now it's just empty pair of strings). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:37:02 +03:00
Pavel Emelyanov	c7a3fed225	storage_service: Make replacement info a real struct This is to extend it in one of the next patches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-26 09:36:16 +03:00
Botond Dénes	4d33812a77	test/scylla-gdb: test scylla read-stats This command was not run before, allowing it to silently break.	2022-08-26 08:08:28 +03:00
Botond Dénes	82c157368a	scylla-gdb.py: read_stats: update w.r.t. post 4.5 code scylla_read_stats is not up-to-date wr.r.t. the type of `reader_concurrency_semaphore::_permit_list`, which was changed in 4.6. Bring it up-to-date, keeping it backwards compatible with 4.5 and older releases.	2022-08-26 07:25:40 +03:00
Botond Dénes	107fd97f45	scylla-gdb.py: improve string_view_printer implementation The `_M_str` member of an `std::string_view` is not guaranteed to be a valid C string (.e.g. be null terminated). Printing it directly often resulted in printing partial strings or printing gibberish, effecting in particular the semaphore diagnostics dumps (scylla read-stats). Use a more reliable method: read `_M_len` amount of bytes from `_M_str` and decode as UTF-8.	2022-08-26 07:25:11 +03:00
Avi Kivity	0dbcd13a0f	config: change logging::settings constructor call to use designated initializer Safer wrt reordering, and more readable too. Closes #11382	2022-08-26 06:14:01 +03:00
Konstantin Osipov	4e128bafb5	docs: clarify the tricky field of row existence in LWT Closes #11372	2022-08-26 06:10:45 +03:00
Vlad Zolotarov	c538cc2372	scylla_prepare + scylla_cpuset_setup: make scylla_cpuset_setup idempotent without introducing regressions This patch fixes the regression introduced by `3a51e78` which broke a very important contract: perftune.yaml should not be "touched" by Scylla scriptology unless explicitly requested. And a call for scylla_cpuset_setup is such an explicit request. The issue that the offending patch was intending to fix was that cpuset.conf was always generated anew for every call of scylla_cpuset_setup - even if a resulting cpuset.conf would come out exactly the same as the one present on the disk before tha call. And since the original code was following the contract mentioned above it was also deleting perftune.yaml every time too. However, this was just an unavoidable side-effect of that cpuset.conf re-generation. The above also means that if scylla_cpuset_setup doesn't write to cpuset.conf we should not "touch" perftune.yaml and vise versa. This patch implements exactly that together with reverting the dangerous logic introduced by `3a51e78`. Fixes #11385 Fixes #10121	2022-08-25 13:03:02 -04:00
Vlad Zolotarov	80917a1054	scylla_prepare: stop generating 'mode' value in perftune.yaml Modern perftune.py supports a more generic way of defining IRQ CPUs: 'irq_cpu_mask'. This patch makes our auto-generation code create a perftune.yaml that uses this new parameter instead of using outdated 'mode'. As a side effect, this change eliminates the notion of "incorrect" value in cpuset.conf - every value is valid now as long as it fits into the 'all' CPU set of the specific machine. Auto-generated 'irq_cpu_mask' is going to include all bits from 'all' CPU mask except those defined in cpuset.conf. Fixes #9903	2022-08-25 13:02:57 -04:00
Benny Halevy	765d2f5e46	release: define SCYLLA_BUILD_MODE_STR by stringifying SCYLLA_BUILD_MODE Currently SCYLLA_BULD_MODE is defined as a string by the cxxflags generated by configure.py. This is not very useful since one cannot use it in a @if preprocessor directive. Instead, use -DSCYLLA_BULD_MODE=release, for example, and define a SCYLLA_BULD_MODE_STR as the dtirng representation of it. In addition define the respective SCYLLA_BUILD_MODE_{RELEASE,DEV,DEBUG,SANITIZE} macros that can be easily used in @ifdef (or #ifndef :)) for conditional compilation. The planned use case for it is to enable a task_manager test module only in non-release modes. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11357	2022-08-25 16:50:42 +02:00
Tzach Livyatan	e86cd3684e	OS support page: use CentOS not Centos	2022-08-25 17:33:50 +03:00
Wojciech Mitros	49dba4f0c1	functions: fix dropping of a keyspace with an aggregate in it Currently, if a keyspace has an aggregate and the keyspace is dropped, the keyspace becomes corrupted and another keyspace with the same name cannot be created again This is caused by the fact that when removing an aggregate, we call create_aggregate() to get values for its name and signature. In the create_aggregate(), we check whether the row and final functions for the aggregate exist. Normally, that's not an issue, because when dropping an existing aggregate alone, we know that its UDFs also exist. But when dropping and entire keyspace, we first drop the UDFs, making us unable to drop the aggregate afterwards. This patch fixes this behavior by removing the create_aggregate() from the aggregate dropping implementation and replacing it with specific calls for getting the aggregate name and signature. Additionally, a test that would previously fail is added to cql-pytest/test_uda.py where we drop a keyspace with an aggregate. Fixes #11327 Closes #11375	2022-08-25 16:28:57 +02:00
Tzach Livyatan	f6157a38a0	OS support page: add 5.1, 2022.1 and 2022.2	2022-08-25 16:44:40 +03:00
Tzach Livyatan	1697f17d90	OS support page: Update CentOS 8 to Rocky 8	2022-08-25 16:43:24 +03:00
Tomasz Grabiec	83850e247a	Merge 'raft: server: handle aborts when waiting for config entry to commit' from Kamil Braun Changing configuration involves two entries in the log: a 'joint configuration entry' and a 'non-joint configuration entry'. We use `wait_for_entry` to wait on the joint one. To wait on the non-joint one, we use a separate promise field in `server`. This promise wasn't connected to the `abort_source` passed into `set_configuration`. The call could get stuck if the server got removed from the configuration and lost leadership after committing the joint entry but before committing the non-joint one, waiting on the promise. Aborting wouldn't help. Fix this by subscribing to the `abort_source` in resolving the promise exceptionally. Furthermore, make sure that two `set_configuration` calls don't step on each other's toes by one setting the other's promise. To do that, reset the promise field at the end of `set_configuration` and check that it's not engaged at the beginning. Fixes #11288. Closes #11325 * github.com:scylladb/scylladb: test: raft: randomized_nemesis_test: additional logging raft: server: handle aborts when waiting for config entry to commit	2022-08-25 12:49:09 +02:00
Avi Kivity	df87949241	Merge "Remove batch tokens update helper" from Pavel E " On token_metadata there are two update_normal_tokens() overloads -- one updates tokens for a single endpoint, another one -- for a set (well -- std::map) of them. Other than updating the tokens both methods also may add an endpoint to the t.m.'s topology object. There's an ongoing effort in moving the dc/rack information from snitch to topology, and one of the changes made in it is -- when adding an entry to topology, the dc/rack info should be provided by the caller (which is in 99% of the cases is the storage service). The batched tokens update is extremely unfriendly to the latter change. Fortunately, this helper is only used by tests, the core code always uses fine-grained tokens updating. " * 'br-tokens-update-relax' of https://github.com/xemul/scylla: token_metadata: Indentation fix after prevuous patch token_metadata: Remove excessive empty tokens check token_metadata: Remove batch tokens updating method tests: Use one-by-one tokens updating method	2022-08-25 12:01:58 +02:00
Wojciech Mitros	9e6e8de38f	tests: prevent test_wasm from occasional failing Some cases in test_wasm.py assumed that all cases are ran in the same order every time and depended on values that should have been added to tables in previous cases. Because of that, they were sometimes failing. This patch removes this assumption by adding the missing inserts to the affected cases. Additionally, an assert that confirms low miss rate of udfs is more precise, a comment is added to explain it clearly. Closes #11367	2022-08-25 11:32:06 +03:00
Kamil Braun	90233551be	test: raft: randomized_nemesis_test: don't access failure detector service after it's stopped It could happen that we accessed failure detector service after it was stopped if a reconfiguration happened in the 'right' moment. This would resolve in an assertion failure. Fix this. Closes #11326	2022-08-25 11:32:06 +03:00
Tomasz Grabiec	1d0264e1a9	Merge 'Implement Raft upgrade procedure' from Kamil Braun Start with a cluster with Raft disabled, end up with a cluster that performs schema operations using group 0. Design doc: https://docs.google.com/document/d/1PvZ4NzK3S0ohMhyVNZZ-kCxjkK5URmz1VP65rrkTOCQ/ (TODO: replace this with .md file - we can do it as a follow-up) The procedure, on a high level, works as follows: - join group 0 - wait until every peer joined group 0 (peers are taken from `system.peers` table) - enter `synchronize` upgrade state, in which group 0 operations are disabled - wait until all members of group 0 entered `synchronize` state or some member entered the final state - synchronize schema by comparing versions and pulling if necessary - enter the final state (`use_new_procedures`), in which group 0 is used for schema operations. With the procedure comes a recovery mode in case the upgrade procedure gets stuck (and it may if we lose a node during recovery - the procedure, to correctly establish a single group 0 cluster, requires contacting every node). This recovery mode can also be used to recover clusters with group 0 already established if they permanently lose a majority of nodes - killing two birds with one stone. Details in the last commit message. Read the design doc, then read the commits in topological order for best reviewing experience. --- I did some manual tests: upgrading a cluster, using the cluster to add nodes, remove nodes (both with `decommission` and `removenode`), replacing nodes. Performing recovery. As a follow-up, we'll need to implement tests using the new framework (after it's ready). It will be easy to test upgrades and recovery even with a single Scylla version - we start with a cluster with the RAFT flag disabled, then rolling-restart while enabling the flag (and recovery is done through simple CQL statements). Closes #10835 * github.com:scylladb/scylladb: service/raft: raft_group0: implement upgrade procedure service/raft: raft_group0: extract `tracker` from `persistent_discovery::run` service/raft: raft_group0: introduce local loggers for group 0 and upgrade service/raft: raft_group0: introduce GET_GROUP0_UPGRADE_STATE verb service/raft: raft_group0_client: prepare for upgrade procedure service/raft: introduce `group0_upgrade_state` db: system_keyspace: introduce `load_peers` idl-compiler: introduce cancellable verbs message: messaging_service: cancellable version of `send_schema_check`	2022-08-25 11:32:06 +03:00
Pavel Emelyanov	d8c5044eee	token_metadata: Indentation fix after prevuous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-24 08:24:21 +03:00
Pavel Emelyanov	8238c38e9f	token_metadata: Remove excessive empty tokens check After the previous patch empty passed tokens make the helper co_return early, so this if is the dead code Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-24 08:24:21 +03:00
Pavel Emelyanov	056d21c050	token_metadata: Remove batch tokens updating method No users left. The endpoint_tokens.empty() check is removed, only tests could trigger it, but they didn't and are patched out. Indentation is left broken Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-24 08:24:21 +03:00
Pavel Emelyanov	1d437302a8	tests: Use one-by-one tokens updating method Tests are the only users of batch tokens updating "sugar" which actually makes things more complicated Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-24 08:24:21 +03:00
Pavel Emelyanov	18fa5038b1	replication_strategy: Remove unused method The get_pending_address_ranges() accepting a single token is not in use, its peer that accepts a set of tokens is Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11358	2022-08-23 20:23:50 +02:00
Avi Kivity	6ce5e9079c	Merge 'utils/logalloc: consolidate lsa state in shard tracker' from Botond Dénes Currently the state of LSA is scattered across a handful of global variables. This series consolidates all these into a single one: the shard tracker. Beyond reducing the number of globals (the less globals, the better) this paves the way for a planned de-globalization of the shard tracker itself. There is one separate global left, the static migrators registry. This is left as-is for now. Closes #11284 * github.com:scylladb/scylladb: utils/logalloc: remove reclaim_timer:: globals utils/logalloc: make s_sanitizer_report_backtrace global a member of tracker utils/logalloc: tracker_reclaimer_lock: get shard tracker via constructor arg utils/logalloc: move global stat accessors to tracker utils/logalloc: allocating_section: don't use the global tracker utils/logalloc: pass down tracker::impl reference to segment_pool utils/logalloc: move segment pool into tracker utils/logalloc: add tracker member to basic_region_impl utils/logalloc: make segment independent of segment pool	2022-08-23 18:51:14 +02:00
Benny Halevy	a980510654	table: seal_active_memtable: handle ENOSPC error Aborting too soon on ENOSPC is too harsh, leading to loss of availability of the node for reads, while restarting it won't solve the ENOSPC condition. Fixes #11245 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11246	2022-08-23 17:58:20 +02:00
Tomasz Grabiec	9c4e32d2e2	Merge 'raft: server: drop waiters in `applier_fiber` instead of `io_fiber`' from Kamil Braun When `io_fiber` fetched a batch with a configuration that does not contain this node, it would send the entries committed in this batch to `applier_fiber` and proceed by any remaining entry dropping waiters (if the node was no longer a leader). If there were waiters for entries committed in this batch, it could either happen that `applier_fiber` received and processed those entries first, notifying the waiters that the entries were committed and/or applied, or it could happen that `io_fiber` reaches the dropping waiters code first, causing the waiters to be resolved with `commit_status_unknown`. The second scenario is undesirable. For example, when a follower tries to remove the current leader from the configuration using `modify_config`, if the second scenario happens, the follower will get `commit_status_unknown` - this can happen even though there are no node or network failures. In particular, this caused `randomized_nemesis_test.remove_leader_with_forwarding_finishes` to fail from time to time. Fix it by serializing the notifying and dropping of waiters in a single fiber - `applier_fiber`. We decided to move all management of waiters into `applier_fiber`, because most of that management was already there (there was already one `drop_waiters` call, and two `notify_waiters` calls). Now, when `io_fiber` observes that we've been removed from the config and no longer a leader, instead of dropping waiters, it sends a message to `applier_fiber`. `applier_fiber` will drop waiters when receiving that message. Improve an existing test to reproduce this scenario more frequently. Fixes #11235. Closes #11308 * github.com:scylladb/scylladb: test: raft: randomized_nemesis_test: more chaos in `remove_leader_with_forwarding_finishes` raft: server: drop waiters in `applier_fiber` instead of `io_fiber` raft: server: use `visit` instead of `holds_alternative`+`get`	2022-08-23 17:19:44 +02:00
Avi Kivity	fd9d8ddb3e	Merge 'distributed_loader: Restore separate processing of keyspace init prio/normal' from Calle Wilund Fixes #11349 In `7396de7` (and refactorings before it) the set of prioritized keyspaces (and processing thereof) was removed, due to apparent non-usage (which is true for open-source version). This functionality is however required for certain features of the enterprise version (ear). As such is needs to be restored and reenabled. This patch set does so, adapted to the recent version of this file. Closes #11350 * github.com:scylladb/scylladb: distributed_loader: Restore separate processing of keyspace init prio/normal Revert "distributed_loader: Remove unused load-prio manipulations"	2022-08-23 16:25:48 +02:00
Kamil Braun	e350e37605	service/raft: raft_group0: implement upgrade procedure A listener is created inside `raft_group0` for acting when the SUPPORTS_RAFT feature is enabled. The listener is established after the node enters NORMAL status (in `raft_group0::finish_setup_after_join()`, called at the end of `storage_service::join_cluster()`). The listener starts the `upgrade_to_group0` procedure. The procedure, on a high level, works as follows: - join group 0 - wait until every peer joined group 0 (peers are taken from `system.peers` table) - enter `synchronize` upgrade state, in which group 0 operations are disabled (see earlier commit which implemented this logic) - wait until all members of group 0 entered `synchronize` state or some member entered the final state - synchronize schema by comparing versions and pulling if necessary - enter the final state (`use_new_procedures`), in which group 0 is used for schema operations (only those for now). The devil lies in the details, and the implementation is ugly compared to this nice description; for example there are many retry loops for handling intermittent network failures. Read the code. `leave_group0` and `remove_group0` were adjusted to handle the upgrade procedure being run correctly; if necessary, they will wait for the procedure to finish. If the upgrade procedure gets stuck (and it may, since it requires all nodes to be available to contact them to correctly establish a single group 0 raft cluster); or if a running cluster permanently loses a majority of nodes, causing group 0 unavailability; the cluster admin is not left without help. We introduce a recovery mode, which allows the admin to completely get rid of traces of existing group 0 and restart the upgrade procedure - which will establish a new group 0. This works even in clusters that never upgraded but were bootstrapped using group 0 from scratch. To do that, the admin does the following on every node: - writes 'recovery' under 'group0_upgrade_state' key in `system.scylla_local` table, - truncates the `system.discovery` table, - truncates the `system.group0_history` table, - deletes group 0 ID and group 0 server ID from `system.scylla_local` (the keys are `raft_group0_id` and `raft_server_id` then the admin performs a rolling restart of their cluster. The nodes restart in a "group 0 recovery mode", which simply means that the nodes won't try to perform any group 0 operations. Then the admin calls `removenode` to remove the nodes that are down. Finally, the admin removes the `group0_upgrade_state` key from `system.scylla_local`, rolling-restarts the cluster, and the cluster should establish group 0 anew. Note that this recovery procedure will have to be extended when new stuff is added to group 0 - like topology change state. Indeed, observe that a minority of nodes aren't able to receive committed entries from a leader, so they may end up in inconsistent group 0 states. It wouldn't be safe to simply create group 0 on those nodes without first ensuring that they have the same state from which group 0 will start. Right now the state only consist of schema tables, and the upgrade procedure ensures to synchronize them, so even if the nodes started in inconsistent schema states, group 0 will correctly be established. (TODO: create a tracking issue? something needs to remind us of this whenever we extend group 0 with new stuff...)	2022-08-23 13:51:01 +02:00
Kamil Braun	b42dfbc0aa	test: raft: randomized_nemesis_test: additional logging Add some more logging to `randomized_nemesis_test` such as logging the start and end of a reconfiguration operation in a way that makes it easy to find one given the other in the logs.	2022-08-23 13:14:30 +02:00
Kamil Braun	efad6fe9b4	raft: server: handle aborts when waiting for config entry to commit Changing configuration involves two entries in the log: a 'joint configuration entry' and a 'non-joint configuration entry'. We use `wait_for_entry` to wait on the joint one. To wait on the non-joint one, we use a separate promise field in `server`. This promise wasn't connected to the `abort_source` passed into `set_configuration`. The call could get stuck if the server got removed from the configuration and lost leadership after committing the joint entry but before committing the non-joint one, waiting on the promise. Aborting wouldn't help. Fix this by subscribing to the `abort_source` in resolving the promise exceptionally. Furthermore, make sure that two `set_configuration` calls don't step on each other's toes by one setting the other's promise. To do that, reset the promise field at the end of `set_configuration` and check that it's not engaged at the beginning. Fixes #11288.	2022-08-23 13:14:29 +02:00
Calle Wilund	54aca8e814	distributed_loader: Restore separate processing of keyspace init prio/normal Fixes #11349 In `7396de7` (and refactorings before it) the set of prioritized keyspaces (and processing thereof) was removed, due to apparent non-usage (which is true for open-source version). This functionality is however required for certain features of the enterprise version (ear). As such is needs to be restored and reenabled. This patch and revert before it does so, adapted to the recent version of this file.	2022-08-23 10:39:19 +00:00
Calle Wilund	d9c391e366	Revert "distributed_loader: Remove unused load-prio manipulations" This reverts commit `7396de72b1`. In `7396de7` (and refactorings before it) the set of prioritized keyspaces (and processing thereof) was removed, due to apparent non-usage (which is true for open-source version). This functionality is however required for certain features of the enterprise version (ear). As such is needs to be restored and reenabled. This reverts the actual commit, patch after ensures we use the prio set.	2022-08-23 10:34:05 +00:00
Avi Kivity	5d1ff17ddf	Merge 'Streaming: define plan_id as a strong tagged_uuid type' from Benny Halevy This series turns plan_id from a generic UUID into a strong type so it can't be used interchangeably with other uuid's. While at it, streaming/stream_fwd.hh was added for forward declarations and the definition of plan_id. Also, `stream_manager::update_progress` parameter name was renamed to plan_id to represent its assumed content, before changing its type to `streaming::plan_id`. Closes #11338 * github.com:scylladb/scylladb: streaming: define plan_id as a strong tagged_uuid type stream_manager: update_progress: rename cf_id param to plan_id streaming: add forward declarations in stream_fwd.hh	2022-08-23 10:48:34 +02:00
Petr Gusev	aa88d58539	raft, use max_command_size to satisfy commitlog limit Commitlog imposes a limit on the size of mutations and throws an exception if it's exceeded. In case of schema changes before raft this exception was delivered to the client. Now it happens while saving the raft command in io_fiber in persistence->store_log_entries and what the client gets is just a timeout exception, which doesn't say much about the cause of the problem. This patch introduces an explicit command size limit and provides a clear error message in this case.	2022-08-23 12:09:32 +04:00
Tomasz Grabiec	0e5b86d3da	Merge 'Optimize mutation consume of range tombstones in reverse' from Benny Halevy Reversing the whole range_tombstone_list into reversed_range_tombstones is inefficient and can lead to reactor stalls with a large number of range tombstones. Instead, iterate over the range_tombsotne_list in reverse direction and reverse each range_tombstone as we go, keeping the result in the optional cookie.reversed_rt member. While at it, this series contains some other cleanups on this path to improve the code readability and maybe make the compiler's life easier as for optimizing the cleaned-up code. Closes #11271 * github.com:scylladb/scylladb: mutation: consume_clustering_fragments: get rid of reversed_range_tombstones; mutation: consume_clustering_fragments: reindent mutation: consume_clustering_fragments: shuffle emit_rt logic around mutation: consume, consume_gently: simplify partition_start logic mutation: consume_clustering_fragments: pass iterators to mutation_consume_cookie ctor mutation: consume_clustering_fragments: keep the reversed schema in cookie mutation: clustering_iterators: get rid of current_rt mutation_test: test_mutation_consume_position_monotonicity: test also consume_gently	2022-08-23 10:05:39 +02:00
Botond Dénes	5bc499080d	utils/logalloc: remove reclaim_timer:: globals One of them (_active_timer) is moved to shard tracker, the other is made a simple local in reclaim_timer.	2022-08-23 10:38:58 +03:00
Botond Dénes	5f8971173e	utils/logalloc: make s_sanitizer_report_backtrace global a member of tracker We want to consolidate all the logalloc state into a single object: the shard tracker. Replacing this global with a member in said object is part of this effort.	2022-08-23 10:38:58 +03:00
Botond Dénes	499b9a3a7c	utils/logalloc: tracker_reclaimer_lock: get shard tracker via constructor arg	2022-08-23 10:38:58 +03:00
Botond Dénes	7d17d675af	utils/logalloc: move global stat accessors to tracker These are pretend free functions, accessing globals in the background, make them a member of the tracker instead, which everything needed locally to compute them. Callers still have to access these stats through the global tracker instance, but this can be changed to happen through a local instance. Soon....	2022-08-23 10:38:58 +03:00
Botond Dénes	f406151a86	utils/logalloc: allocating_section: don't use the global tracker Instead, get the tracker instance from the region. This requires adding a `region&` parameter to `with_reserve()`. This brings us one step closer to eliminating the global tracker.	2022-08-23 10:38:58 +03:00
Botond Dénes	e968866fa1	utils/logalloc: pass down tracker::impl reference to segment_pool To get rid of some usages of `shard_tracker()`.	2022-08-23 10:38:58 +03:00
Botond Dénes	3bd94e41bf	utils/logalloc: move segment pool into tracker Instead of a separate global segment pool instance, make it a member of the already global tracker. Most users are inside the tracker instance anyway. Outside users can access the pool through the global tracker instance.	2022-08-23 10:38:58 +03:00
Botond Dénes	5b86dfc35a	utils/logalloc: add tracker member to basic_region_impl For now this member is initialized from the global tracker instance. But it allows the members of region impl to be detached from said global, making a step towards removing it.	2022-08-23 10:38:58 +03:00
Botond Dénes	f4056bd344	utils/logalloc: make segment independent of segment pool segment has some members, which simply forward the call to a segment_pool method, via the global segment_pool instance. Remove these and make the callers use the segment pool directly instead.	2022-08-23 10:38:58 +03:00
Nadav Har'El	9c15659194	Merge 'test.py: bump timeout of async requests for topology' from Alecco Topology tests do async requests using the Python driver. The driver's API for async doesn't use the session timeout. Pass 60 seconds timeout (default is 10) to match the session's. Fixes https://github.com/scylladb/scylladb/issues/11289 Closes #11348 * github.com:scylladb/scylladb: test.py: bump schema agreement timeout for topology tests test.py: bump timeout of async requests for topology test.py: fix bad indent	2022-08-23 10:30:59 +03:00
Raya Kurlyand	bc7539cff0	Update auditing.rst https://github.com/scylladb/scylladb/issues/11341 Closes #11347	2022-08-23 06:59:41 +03:00
Botond Dénes	331033adae	Merge 'Fix frozen mutation consume ordering' from Benny Halevy Currently, frozen_mutation is not consumed in position_in_partition order as all range tombstones are consumed before all rows. This violates the range_tombstone_generator invariants as its lower_bound needs to be monotonically increasing. Fix this by adding mutation_partition_view::accept_ordered and rewriting do_accept_gently to do the same, both making sure to consume the range tombstones and clustering rows in position_in_partition order, similar to the mutation consume_clustering_fragments function. Add a unit test that verifies that. Fixes #11198 Closes #11269 * github.com:scylladb/scylladb: mutation_partition_view: make mutation_partition_view_virtual_visitor stoppable frozen_mutation: consume and consume_gently in-order frozen_mutation: frozen_mutation_consumer_adaptor: rename rt to rtc frozen_mutation: frozen_mutation_consumer_adaptor: return early when flush returns stop_iteration::yes frozen_mutation: frozen_mutation_consumer_adaptor: consume static row unconditionally frozen_mutation: frozen_mutation_consumer_adaptor: flush current_row before rt_gen	2022-08-23 06:37:04 +03:00
Alejo Sanchez	01cac33472	test.py: bump schema agreement timeout for topology tests Increase the schema agreement timeout to match other timeouts. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-22 21:07:55 +02:00
Alejo Sanchez	f9d31112cf	test.py: bump timeout of async requests for topology Topology tests do async requests using the Python driver. The driver's API for async doesn't use the session timeout. Pass 60 seconds timeout (default is 10) to match the session's. This will hopefully will fix timeout failures on debug mode. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-22 21:07:03 +02:00
Benny Halevy	357e805e1f	mutation_partition_view: make mutation_partition_view_virtual_visitor stoppable So that the frozen_mutation consumer can return stop_iteration::yes if it wishes to stop consuming at some clustering position. In this case, on_end_of_partition must still be called so a closing range_tombstone_change can be emitted to the consumer. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-22 20:12:58 +03:00
Mikołaj Sielużycki	b5380baf8a	frozen_mutation: consume and consume_gently in-order Currently, frozen_mutation is not consumed in position_in_partition order as all range tombstones are consumed before all rows. This violates the range_tombstone_generator invariants as its lower_bound needs to be monotonically increasing. Fix this by adding mutation_partition_view::accept_ordered and rewriting do_accept_gently to do the same, both making sure to consume the range tombstones and clustering rows in position_in_partition order, similar to the mutation consume_clustering_fragments function. Add a unit test that verifies that. Fixes #11198 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-22 20:12:20 +03:00
Kamil Braun	e0c6153adf	test: raft: randomized_nemesis_test: more chaos in `remove_leader_with_forwarding_finishes` Improve the randomness of this test, making it a bit easier to reproduce the scenarios that the test aims to catch. Increase timeouts a bit to account for this additional randomness.	2022-08-22 18:53:48 +02:00
Kamil Braun	db2a3deda1	raft: server: drop waiters in `applier_fiber` instead of `io_fiber` When `io_fiber` fetched a batch with a configuration that does not contain this node, it would send the entries committed in this batch to `applier_fiber` and proceed by any remaining entry dropping waiters (if the node was no longer a leader). If there were waiters for entries committed in this batch, it could either happen that `applier_fiber` received and processed those entries first, notifying the waiters that the entries were committed and/or applied, or it could happen that `io_fiber` reaches the dropping waiters code first, causing the waiters to be resolved with `commit_status_unknown`. The second scenario is undesirable. For example, when a follower tries to remove the current leader from the configuration using `modify_config`, if the second scenario happens, the follower will get `commit_status_unknown` - this can happen even though there are no node or network failures. In particular, this caused `randomized_nemesis_test.remove_leader_with_forwarding_finishes` to fail from time to time. Fix it by serializing the notifying and dropping of waiters in a single fiber - `applier_fiber`. We decided to move all management of waiters into `applier_fiber`, because most of that management was already there (there was already one `drop_waiters` call, and two `notify_waiters` calls). Now, when `io_fiber` observes that we've been removed from the config and no longer a leader, instead of dropping waiters, it sends a message to `applier_fiber`. `applier_fiber` will drop waiters when receiving that message. Fixes #11235.	2022-08-22 18:53:44 +02:00
Kamil Braun	5badf20c7a	raft: server: use `visit` instead of `holds_alternative`+`get` In `std::holds_alternative`+`std::get` version, the `get` performs a redundant check. Also `std::visit` gives a compile-time exhaustiveness check (whether we handled all possible cases of the `variant`).	2022-08-22 18:47:48 +02:00
Benny Halevy	314e45d957	streaming: define plan_id as a strong tagged_uuid type Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-22 19:45:30 +03:00
Benny Halevy	add612bc52	mutation: consume_clustering_fragments: get rid of reversed_range_tombstones; Reversing the whole range_tombstone_list into reversed_range_tombstones is inefficient and can lead to reactor stalls with a large number of range tombstones. Instead, iterator over the range_tombsotne_list in reverse direction and reverse each range_tombstone as we go, keeping the result in the optional cookie.reversed_rt member. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-22 19:42:52 +03:00
Alejo Sanchez	87c233b36b	test.py: fix bad indent Fix leftover bad indent Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-22 14:29:54 +02:00
Nadav Har'El	941c719a23	alternator: return ProvisionedThroughput in DescribeTable DescribeTable is currently hard-coded to return PAY_PER_REQUEST billing mode. Nevertheless, even in PAY_PER_REQUEST mode, the DescribeTable operation must return a ProvisionedThroughput structure, listing both ReadCapacityUnits and WriteCapacityUnits as 0. This requirement is not stated in some DynamoDB documentation but is explictly mentioned in https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ProvisionedThroughput.html Also in empirically, DynamoDB returns ProvisionedThroughput with zeros even in PAY_PER_REQUEST mode. We even had an xfailing test to confirm this. The ProvisionedThroughput structure being missing was a problem for applications like DynamoDB connectors for Spark, if they implicitly assume that ProvisionedThroughput is returned by DescribeTable, and fail (as described in issue #11222) if it's outright missing. So this patch adds the missing ProvisionedThroughput structure, and the xfailing test starts to pass. Note that this patch doesn't change the fact that attempting to set a table to PROVISIONED billing mode is ignored: DescribeTable continues to always return PAY_PER_REQUEST as the billing mode and zero as the provisioned capacities. Fixes #11222 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11298	2022-08-22 09:58:09 +02:00
Takuya ASADA	60e8f5743c	systemd: drop StandardOutput=syslog On recent version of systemd, StandardOutput=syslog is obsolete. We should use StandardOutput=journal instead, but since it's default value, so we can just drop it. Fixes #11322 Closes #11339	2022-08-22 10:47:37 +03:00
Benny Halevy	fa7033bc2b	configure: add --perf-tests-debuginfo option Provides separate control over debuginfo for perf tests since enabling --tests-debuginfo affects both today causing the Jenkins archives of perf tests binaries to inflate considerably. Refs https://github.com/scylladb/scylla-pkg/issues/3060 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11337	2022-08-21 19:08:21 +03:00
Konstantin Osipov	4b6ed7796b	test.py: extend documentation Add documentation about python, topology tests, server pooling and provide some debugging tips. Closes #11317	2022-08-21 17:55:49 +03:00
Benny Halevy	3554533e2c	stream_manager: update_progress: rename cf_id param to plan_id Before changing its type to streaming::plan_id this patch clarifies that the parameter actually represents the plan id and not the table id as its name suggests. For reference, see the call to update_progress in `stream_transfer_task::execute`, as well as the function using _stream_bytes which map key is the plan id. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-21 16:56:41 +03:00
Benny Halevy	c1fc0672a5	streaming: add forward declarations in stream_fwd.hh To be used for defining streaming::plan_id in the next patcvh. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-21 16:00:02 +03:00
Anna Stuchlik	3b93184680	doc: fix the description of ORDER BY for the SELECT statement Closes #11272	2022-08-21 15:28:15 +03:00
Tzach Livyatan	8fc58300ea	Update Alternator Markdown file to use automatic link notation Closes #11335	2022-08-21 13:32:57 +03:00
Piotr Sarna	484004e766	Merge 'Fix mutation commutativity with shadowable tombstone' from Tomasz Grabiec This series fixes lack of mutation associativity which manifests as sporadic failures in row_cache_test.cc::test_concurrent_reads_and_eviction due to differences in mutations applied and read. No known production impact. Refs https://github.com/scylladb/scylladb/issues/11307 Closes #11312 * github.com:scylladb/scylladb: test: mutation_test: Add explicit test for mutation commutativity test: random_mutation_generator: Workaround for non-associativity of mutations with shadowable tombstones db: mutation_partition: Drop unnecessary maybe_shadow() db: mutation_partition: Maintain shadowable tombstone invariant when applying a hard tombstone mutation_partition: row: make row marker shadowing symmetric	2022-08-20 16:46:32 +02:00
Kamil Braun	2ba1fb0490	service/raft: raft_group0: extract `tracker` from `persistent_discovery::run` Extract it to a top-level abstraction, write comments. It will be reused in the following commit.	2022-08-19 19:15:19 +02:00
Kamil Braun	f7e02a7de9	service/raft: raft_group0: introduce local loggers for group 0 and upgrade	2022-08-19 19:15:19 +02:00
Kamil Braun	ac5f4248a9	service/raft: raft_group0: introduce GET_GROUP0_UPGRADE_STATE verb During the upgrade procedure nodes will want to obtain the upgrade state of other nodes to proceed. This is what the new verb is for.	2022-08-19 19:15:19 +02:00
Kamil Braun	43687be1f1	service/raft: raft_group0_client: prepare for upgrade procedure Now, whether an 'group 0 operation' (today it means schema change) is performed using the old or new methods, doesn't depend on the local RAFT fature being enabled, but on the state of the upgrade procedure. In this commit the state of the upgrade is always `use_pre_raft_procedures` because the upgrade procedure is not implemented yet. But stay tuned. The upgrade procedure will need certain guarantees: at some point it switches from `use_pre_raft_procedures` to `synchronize` state. During `synchronize` schema changes must be disabled, so the procedure can ensure that schema is in sync across the entire cluster before establishing group 0. Thus, when the switch happens, no schema change can be in progress. To handle all this weirdness we introduce `_upgrade_lock` and `get_group0_upgrade_state` which takes this lock whenever it returns `use_pre_raft_procedures`. Creating a `group0_guard` - which happens at the start of every group 0 operation - will take this lock, and the lock holder shall be stored inside the guard (note: the holder only holds the lock if `use_pre_raft_procedures` was returned, no need to hold it for other cases). Because `group0_guard` is held for the entire duration of a group 0 operation, and because the upgrade procedure will also have to take this lock whenever it wants to change the upgrade state (it's an rwlock), this ensures that no group 0 operation that uses the old ways is happening when we change the state. We also implement `wait_until_group0_upgraded` using a condition variable. It will be used by certain methods during upgrade (later commits; stay tuned). Some additional comments were written.	2022-08-19 19:15:19 +02:00
Kamil Braun	7e56251aea	service/raft: introduce `group0_upgrade_state` Define an enum class, `group0_upgrade_state`, describing the state of the upgrade procedure (implemented in later commits). Provide IDL definitions for (de)serialization. The node will have its current upgrade state stored on disk in `system.scylla_local` under the `group0_upgrade_state` key. If the key is not present we assume `use_pre_raft_procedures` (meaning we haven't started upgrading yet or we're at the beginning of upgrade). Introduce `system_keyspace` accessor methods for storing and retrieving the on-disk state.	2022-08-19 19:15:19 +02:00
Kamil Braun	547134faf4	db: system_keyspace: introduce `load_peers` Load the addresses of our peers from `system.peers`. Will be used be the Raft upgrade procedure to obtain the set of all peers.	2022-08-19 19:15:18 +02:00
Kamil Braun	a5b465b796	idl-compiler: introduce cancellable verbs The compiler allowed passing a `with_timeout` flag to a verb definition; it then generated functions for sending and handling RPCs that accepted a timeout parameter. We would like to generate functions that accept an `abort_source` so an RPC can be cancelled from the sender side. This is both more and less powerful than `with_timeout`. More powerful because you can abort on other conditions than just reaching a certain point in time. Less powerful because you can't abort the receiver. In any case, sometimes useful. For this the `cancellable` flag was added. You can't use `with_timeout` and `cancellable` at the same verb. Note that this uses an already existing function in RPC module, `send_message_cancellable`.	2022-08-19 19:15:18 +02:00
Kamil Braun	9e5a81da4a	message: messaging_service: cancellable version of `send_schema_check` This RPC will be used during the Raft upgrade procedure during schema synchronization step. Make a version which can be cancelled when the upgrade procedure gets aborted.	2022-08-19 19:15:18 +02:00
Nadav Har'El	516089beb0	Merge 'Raft test topology II part 1' from Alecco - Remove `ScyllaCluster.__getitem__()` (pending request by @kbr- in a previous pull request), for this remove all direct access to servers from caller code - Increase Python driver timeouts (req by @nyh) - Improve `ManagerClient` API requests: use `http+unix://<sockname>/<resource>` instead of `http://localhost/<resource>` and callers of the helper method only pass the resource - Improve lint and type hints Closes #11305 * github.com:scylladb/scylladb: test.py: remove ScyllaCluster.__getitem__() test.py: ScyllaCluster check kesypace with any server test.py: ScyllaCluster server error log method test.py: ScyllaCluster read_server_log() test.py: save log point for all running servers test.py: ScyllaCluster provide endpoint test.py: build host param after before_test test.py: manager client disable lint warnings test.py: scylla cluster lint and type hint fixes test.py: increase more timeouts test.py: ManagerClient improve API HTTP requests	2022-08-18 20:27:50 +03:00
Alejo Sanchez	fe07f9ceed	test.py: make topology conftest module paths work when imported To allow other suites to use topology suite conftest, add pylib to the module lookup path. Closes #11313	2022-08-18 20:22:35 +03:00
Konstantin Osipov	7481f0d404	test.py: simplify CQL test search No need to repeat code available in the base class. Closes #11156	2022-08-18 19:28:43 +03:00
Benny Halevy	7747b8fa33	sstables: define run_identifier as a strong tagged_uuid type Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11321	2022-08-18 19:03:10 +03:00
Avi Kivity	35fbba3a5b	Revert "gms: gossiper: include nodes with empty feature sets when calculating enabled features" This reverts commit `08842444b4`. It causes a failure in test_shutdown_all_and_replace_node. Fixes #11316.	2022-08-18 15:01:50 +03:00
Kamil Braun	b52429f724	Merge 'raft: relax some error severity' from Gleb Natapov Dtest fails if it sees an unknown errors in the logs. This series reduces severity of some errors (since they are actually expected during shutdown) and removes some others that duplicate already existing errors that dtest knows how to deal with. Also fix one case of unhandled exception in schema management code. * 'dtest-fixes-v1' of github.com:gleb-cloudius/scylla: raft: getting abort_requested_exception exception from a sm::apply is not a critical error schema_registry: fix abandoned feature warning service: raft: silence rpc::closed_errors in raft_rpc	2022-08-18 12:16:44 +02:00
Anna Stuchlik	dc307b6895	doc: fix the CQL version in the Interfaces table	2022-08-18 12:02:42 +02:00
Petr Gusev	eedfd7ad9b	raft, limit for command size Adds max_command_size to the raft configuration and restricts commands to this limit.	2022-08-18 13:35:49 +04:00
Tomasz Grabiec	5a9df433c6	test: mutation_test: Add explicit test for mutation commutativity	2022-08-17 17:39:54 +02:00
Tomasz Grabiec	3d9efee3bf	test: random_mutation_generator: Workaround for non-associativity of mutations with shadowable tombstones Given 3 row mutations: m1 = { marker: {row_marker: dead timestamp=-9223372036854775803}, tombstone: {row_tombstone: {shadowable tombstone: timestamp=-9223372036854775807, deletion_time=0}, {tombstone: none}} } m2 = { marker: {row_marker: timestamp=-9223372036854775805} } m3 = { tombstone: {row_tombstone: {shadowable tombstone: timestamp=-9223372036854775806, deletion_time=2}, {tombstone: none}} } We get different shadowable tombstones depending on the order of merging: (m1 + m2) + m3 = { marker: {row_marker: dead timestamp=-9223372036854775803}, tombstone: {row_tombstone: {shadowable tombstone: timestamp=-9223372036854775806, deletion_time=2}, {tombstone: none}} m1 + (m2 + m3) = { marker: {row_marker: dead timestamp=-9223372036854775803}, tombstone: {row_tombstone: {shadowable tombstone: timestamp=-9223372036854775807, deletion_time=0}, {tombstone: none}} } The reason is that in the second case the shadowable tombstone in m3 is shadwed by the row marker in m2. In the first case, the marker in m2 is cancelled by the dead marker in m1, so shadowable tombstone in m3 is not cancelled (the marker in m1 does not cancel because it's dead). This wouldn't happen if the dead marker in m1 was accompanied by a hard tombstone of the same timestamp, which would effectively make the difference in shadowable tombstones irrelevant. Found by row_cache_test.cc::test_concurrent_reads_and_eviction. I'm not sure if this situation can be reached in practice (dead marker in mv table but no row tombstone). Work it around for tests by producing a row tombstone if there is a dead marker. Refs #11307	2022-08-17 17:39:54 +02:00
Tomasz Grabiec	56e5b6f095	db: mutation_partition: Drop unnecessary maybe_shadow() It is performed inside row_tombstone::apply() invoked in the preceding line.	2022-08-17 17:39:54 +02:00
Tomasz Grabiec	9c66c9b3f0	db: mutation_partition: Maintain shadowable tombstone invariant when applying a hard tombstone When the row has a live row marker which shadows the shadowable tombstone, the shadowable tombstone should not be effective. The code assumes that _shadowable always reflects the current tombstone, so maybe_shadow() needs to be called whenever marker or regular tombstone changes. This was not ensured by row::apply(tombstone). This causes problems in tests which use random_mutation_generator, which generates mutations which would violate this invariant, and as a result, mutation commutativity would be violated. I am not aware of problems in production code.	2022-08-17 17:34:13 +02:00
Botond Dénes	778f5adde7	mutation_partition: row: make row marker shadowing symmetric Currently row marker shadowing the shadowable tombstone is only checked in `apply(row_marker)`. This means that shadowing will only be checked if the shadowable tombstone and row marker are set in the correct order. This at the very least can cause flakyness in tests when a mutation produced just the right way has a shadowable tombstone that can be eliminated when the mutation is reconstructed in a different way, leading to artificial differences when comparing those mutations. This patch fixes this by checking shadowing in `apply(shadowable_tombstone)` too, making the shadowing check symmetric. There is still one vulnerability left: `row_marker& row_marker()`, which allow overwriting the marker without triggering the corresponding checks. We cannot remove this overload as it is used by compaction so we just add a comment to it warning that `maybe_shadow()` has to be manually invoked if it is used to mutate the marker (compaction takes care of that). A caller which didn't do the manual check is mutation_source_test: this patch updates it to use `apply(row_marker)` instead. Fixes: #9483 Tests: unit(dev) Closes #9519	2022-08-17 17:22:13 +02:00
Benny Halevy	8f0376bba1	mutation: consume_clustering_fragments: reindent Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-17 16:45:20 +03:00
Benny Halevy	749371c2b0	mutation: consume_clustering_fragments: shuffle emit_rt logic around To prepare for a following patch that will get rid of the cookie.reversed_range_tombstones list. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-17 16:44:23 +03:00
Benny Halevy	0e21073c38	mutation: consume, consume_gently: simplify partition_start logic Concentrate the logic in a single (!cookie.partition_start_consumed) block Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-17 15:49:12 +03:00
Benny Halevy	d661b84d51	mutation: consume_clustering_fragments: pass iterators to mutation_consume_cookie ctor and set crs and rts only in the block where they are used, so we can get rid of reversed_range_tombstones. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-17 15:30:36 +03:00
Benny Halevy	f1b7a1a6f1	mutation: consume_clustering_fragments: keep the reversed schema in cookie Rather than reversing the schema on every call just keep the potentially reversed schema in cookie. Othwerwise, cookie.schema was write only. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-17 15:30:36 +03:00
Benny Halevy	a230ea0019	mutation: clustering_iterators: get rid of current_rt It is currently write-only. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-17 15:30:16 +03:00
Benny Halevy	017f9b4131	mutation_test: test_mutation_consume_position_monotonicity: test also consume_gently Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-17 14:43:52 +03:00
Alejo Sanchez	d732d776ed	test.py: remove ScyllaCluster.__getitem__() Users of ScyllaCluster should not directly manage its ScyllaServers. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-17 10:24:48 +02:00
Alejo Sanchez	729f8e2834	test.py: ScyllaCluster check kesypace with any server Directly pick any server instead of calling self[0]. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-17 10:24:48 +02:00
Alejo Sanchez	7ad7a5e718	test.py: ScyllaCluster server error log method Provide server error logs to caller (test.py). Avoids direct access to list of servers. To be done later: pick the failed server. For now it just provides the log of one server. While there, fix type hints. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-17 10:24:48 +02:00
Alejo Sanchez	e755207fcc	test.py: ScyllaCluster read_server_log() Instead of accessing the first server, now test.py asks ScyllaCluster for the server log. In a later commit, ScyllaCluster will pick the appropriate server. Also removes another direct access to the list of servers we want to get rid of.	2022-08-17 10:24:48 +02:00
Alejo Sanchez	f141ab95f9	test.py: save log point for all running servers For error reporting, before a test a mark of the log point in time is saved. Previously, only the log of the first server was saved. Now it's done for all running servers. While there, remove direct access to servers on test.py. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-17 10:24:48 +02:00
Alejo Sanchez	8fff636776	test.py: ScyllaCluster provide endpoint For pytest CQL driver connections a host id (IP) is used. Provide it with a method. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-17 10:24:48 +02:00
Alejo Sanchez	5bd266424e	test.py: build host param after before_test If no server started, there is no server in the cluster list. So only build the pytest --host param after before_test check is done. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-17 10:24:48 +02:00
Alejo Sanchez	30c8e961ba	test.py: manager client disable lint warnings Disable noisy lint warnings. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-17 10:24:48 +02:00
Alejo Sanchez	2b4c7fbb8a	test.py: scylla cluster lint and type hint fixes Add missing docstrings, reorder imports, add type hints, improve formatting, fix variable names, fix line lengths, iterate over dicts not keys, and disable noisy lint warnings. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-17 10:24:48 +02:00
Alejo Sanchez	566a4ebf4e	test.py: increase more timeouts Increase Python driver connection timeouts to deal with extreme cases for slow debug builds in slow machines as done (and explained) in `95bd02246a`. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-17 10:24:48 +02:00
Alejo Sanchez	ce27c02d91	test.py: ManagerClient improve API HTTP requests Use the AF Unix socket name as host name instead of localhost and avoid repeating the full URL for callers of _request() for the Manager API requests from the client. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-17 10:24:48 +02:00
Benny Halevy	1b997a8514	frozen_mutation: frozen_mutation_consumer_adaptor: rename rt to rtc It is a range_tombstone_change, not a range_tombstone. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-17 10:17:42 +03:00
Benny Halevy	87fd4a7d82	frozen_mutation: frozen_mutation_consumer_adaptor: return early when flush returns stop_iteration::yes If the consumer return stop_iteration::yes for a flushed row (static or clustered, we should return early and no consume any more fragments, until `on_end_of_partition`, where we may still consume a closing range_tombstone_change past the last consumed row. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-17 10:17:42 +03:00
Benny Halevy	f11a5e2ec8	frozen_mutation: frozen_mutation_consumer_adaptor: consume static row unconditionally Consuming the static row is the first ooportunity for the consumer to return stop_iteration::yes, so there's no point in checking `_stop_consuming` before consuming it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-17 10:17:42 +03:00
Benny Halevy	4b4eb9037a	frozen_mutation: frozen_mutation_consumer_adaptor: flush current_row before rt_gen We already flushed rt_gen when building the current_row When we get to flush_rows_and_tombstones, we should just consume it, as the passed position is not if the current_row but rather a position following it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-17 10:17:42 +03:00
Nadav Har'El	055340ae39	cql-pytest: increase more timeouts In commit `7eda6b1e90`, we increased the request_timeout parameter used by cql-pytest tests from the default of 10 seconds to 120 seconds. 10 seconds was usually more than enough for finishing any Scylla request, but it turned out that in some extreme cases of a debug build running on an extremely over-committed machine, the default timeout was not enough. Recently, in issue #11289 we saw additional cases of timeouts which the request_timeout setting did not solve. It turns out that the Python CQL driver has two additional timeout settings - connect_timeout and control_connection_timeout, which default to 5 seconds and 2 seconds respectively. I believe that most of the timeouts in issue #11289 come from the control_connection_timeout setting - by changing it to a tiny number (e.g., 0.0001) I got the same error messages as those reported in #11289. The default of that timeout - 2 seconds - is certainly low enough to be reached on an extremely over-committed machine. So this patch significantly increases both connect_timeout and control_connection_timeout to 60 seconds. We don't care that this timeout is ridiculously large - under normal operations it will never be reached. There is no code which loops for this amount of time, for example. Refs #11289 (perhaps even Fixes, we'll need to see that the test errors go away). NOTE: This patch only changes test/cql-pytest/util.py, which is only used by the cql-pytest test suite. We have multiple other test suites which copied this code, and those test suites might need fixing separately. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11295	2022-08-16 19:11:59 +03:00
Kamil Braun	08842444b4	gms: gossiper: include nodes with empty feature sets when calculating enabled features Right now, if there's a node for which we don't know the features supported by this node (they are neither persisted locally, nor gossiped by that node), we would skip this node in calculating the set of enabled features and potentially enable a feature which shouldn't be enabled - because that node may not know it. We should only enable a feature when we know that all nodes have upgraded and know the feature. This bug caused us problems when we tried to move RAFT out of experimental. There are dtests such as `partitioner_tests.py` in which nodes would enable features prematurely, which caused the Raft upgrade procedure to break (the procedure starts only when all nodes upgrade and announce that they know the SUPPORTS_RAFT cluster feature). Closes #11225	2022-08-16 19:07:41 +03:00
Piotr Sarna	cf30d4cbcf	Merge 'Secondary index of collection columns' from Nadav Har'El This pull request introduces global secondary-indexing for non-frozen collections. The intent is to enable such queries: ``` CREATE TABLE test(int id, somemap map<int, int>, somelist<int>, someset<int>, PRIMARY KEY(id)); CREATE INDEX ON test(keys(somemap)); CREATE INDEX ON test(values(somemap)); CREATE INDEX ON test(entries(somemap)); CREATE INDEX ON test(values(somelist)); CREATE INDEX ON test(values(someset)); -- index on test(c) is the same as index on (values(c)) CREATE INDEX IF NOT EXISTS ON test(somelist); CREATE INDEX IF NOT EXISTS ON test(someset); CREATE INDEX IF NOT EXISTS ON test(somemap); SELECT * FROM test WHERE someset CONTAINS 7; SELECT * FROM test WHERE somelist CONTAINS 7; SELECT * FROM test WHERE somemap CONTAINS KEY 7; SELECT * FROM test WHERE somemap CONTAINS 7; SELECT * FROM test WHERE somemap[7] = 7; ``` We use here all-familiar materialized views (MVs). Scylla treats all the collections the same way - they're a list of pairs (key, value). In case of sets, the value type is dummy one. In case of lists, the key type is TIMEUUID. When describing the design, I will forget that there is more than one collection type. Suppose that the columns in the base table were as follows: ``` pkey int, ckey1 int, ckey2 int, somemap map<int, text>, PRIMARY KEY(pkey, ckey1, ckey2) ``` The MV schema is as follows (the names of columns which are not the same as in base might be different). All the columns here form the primary key. ``` -- for index over entries indexed_coll (int, text), idx_token long, pkey int, ckey1 int, ckey2 int -- for index over keys indexed_coll int, idx_token long, pkey int, ckey1 int, ckey2 int -- for index over values indexed_coll text, idx_token long, pkey int, ckey1 int, ckey2 int, coll_keys_for_values_index int ``` The reason for the last additional column is that the values from a collection might not be unique. Fixes #2962 Fixes #8745 Fixes #10707 This patch does not implement local secondary indexes for collection columns: Refs #10713. Closes #10841 * github.com:scylladb/scylladb: test/cql-pytest: un-xfail yet another passing collection-indexing test secondary index: fix paging in map value indexing test/cql-pytest: test for paging with collection values index cql, view: rename and explain bytes_with_action cql, index: make collection indexing a cluster feature test/cql-pytest: failing tests for oversized key values in MV and SI cql: fix secondary index "target" when column name has special characters cql, index: improve error messages cql, index: fix default index name for collection index test/cql-pytest: un-xfail several collecting indexing tests test/cql-pytest/test_secondary_index: verify that local index on collection fails. docs/design-notes/secondary_index: add `VALUES` to index target list test/cql-pytest/test_secondary_index: add randomized test for indexes on collections cql-pytest/cassandra_tests/.../secondary_index_test: fix error message in test ported from Cassandra cql-pytest/cassandra_tests/.../secondary_index_on_map_entries,select_test: test ported from Cassandra is expected to fail, since Scylla assumes that comparison with null doesn't throw error, just evaluates to false. Since it's not a bug, but expected behavior from the perspective of Scylla, we don't mark it as xfail. test/boost/secondary_index_test: update for non-frozen indexes on collections test/cql-pytest: Uncomment collection indexes tests that should be working now cql, index: don't use IS NOT NULL on collection column cql3/statements/select_statement: for index on values of collection, don't emit duplicate rows cql/expr/expression, index/secondary_index_manager: needs_filtering and index_supports_expression rewrite to accomodate for indexes over collections cql3, index: Use entries() indexes on collections for queries cql3, index: Use keys() and values() indexes on collections for queries. types/tuple: Use std::begin() instead of .begin() in tuple_type_impl::build_value_fragmented cql3/statements/index_target: throw exception to signalize that we didn't miss returning from function db/view/view.cc: compute view_updates for views over collections view info: has_computed_column_depending_on_base_non_primary_key column_computation: depends_on_non_primary_key_column schema, index/secondary_index_manager: make schema for index-induced mv index/secondary_index_manager: extract keys, values, entries types from collection cql3/statements/: validate CREATE INDEX for index over a collection cql3/statements/create_index_statement,index_target: rewrite index target for collection column_computation.hh, schema.cc: collection_column_computation column_computation.hh, schema.cc: compute_value interface refactor Cql.g, treewide: support cql syntax `INDEX ON table(VALUES(collection))`	2022-08-16 14:18:51 +02:00
Nadav Har'El	fbb0b66d0c	test/cql-pytest: fix run's "--ssl" option Commit `23acc2e848` broke the "--ssl" option of test/cql-pytest/run (which makes Scylla - and cqlpytest - use SSL-encrypted CQL). The problem was that there was a confusion between the "ssl" module (Python's SSL support) and a new "ssl" variable. A rename and a missing "import" solves the breakage. We never noticed this because Jenkins does not run cql-pytest/run with --ssl (actually, it no longer runs cql-pytest/run at all). It is still a useful option for checking SSL-related problems in Scylla and Seastar. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11292	2022-08-16 12:29:05 +02:00
Kamil Braun	4e35e62597	Merge 'Raft test topology part 3' from Alecco Test schema changes when there was an underlying topology change. - per test case checks of cluster health and cycling - helper class to do cluster manager API requests - tests can perform topology changes: stop/start/restart servers - modified clusters are marked dirty and discarded after the test case - cql connection is updated per topology change and per cluster change Closes #11266 * github.com:scylladb/scylladb: test.py: test topology and schema changes test.py: ClusterManager API mark cluster dirty test.py: call before/after_test for each test case test.py: handle driver connection in ManagerClient test.py: ClusterManager API and ManagerClient test.py: improve topology docstring	2022-08-16 11:00:26 +02:00
Avi Kivity	afa7960926	Merge 'database: evict all inactive reads for table when detaching table' from Botond Dénes Currently, when detaching the table from the database, we force-evict all queriers for said table. This series broadens the scope of this force-evict to include all inactive reads registered at the semaphore. This ensures that any regular inactive read "forgotten" for any reason in the semaphore, will not end up in said readers accessing a dangling table reference when destroyed later. Fixes: https://github.com/scylladb/scylladb/issues/11264 Closes #11273 * github.com:scylladb/scylladb: querier: querier_cache: remove now unused evict_all_for_table() database: detach_column_family(): use reader_concurrency_semaphore::evict_inactive_reads_for_table() reader_concurrency_semaphore: add evict_inactive_reads_for_table()	2022-08-15 19:05:59 +03:00
Botond Dénes	d56dcb842c	db/virtual_table: add virtual destructor to virtual_table It should have had one, derived instances are stored and destroyed via the base-class. The only reason this haven't caused bugs yet is that derived instances happen to not have any non-trivial members yet. Closes #11293	2022-08-15 16:58:05 +03:00
Avi Kivity	73d4930815	Merge 'test/lib: various improvements to sstable test env' from Botond Dénes A mixed bag of improvements developed as part of another PR (https://github.com/scylladb/scylladb/pull/10736). Said PR was closed so I'm submitting these improvements separately. Closes #11294 * github.com:scylladb/scylladb: test/lib: move convenience table config factory to sstable_test_env test/lib/sstable_test_env: move members to impl struct test/lib/sstable_utils: use test_env::do_with_async()	2022-08-15 16:57:01 +03:00
Botond Dénes	92e5f438a4	querier: querier_cache: remove now unused evict_all_for_table()	2022-08-15 14:16:41 +03:00
Botond Dénes	2b1eb6e284	database: detach_column_family(): use reader_concurrency_semaphore::evict_inactive_reads_for_table() Instead of querier_cache::evict_all_for_table(). The new method cover all queriers and in addition any other inactive reads registered on the semaphore. In theory by the time we detach a table, no regular inactive reads should be in the semaphore anymore, but if there is any still, we better evict them before the table is destroyed, they might attempt to access it in when destroyed later.	2022-08-15 14:16:41 +03:00
Botond Dénes	e55ccbde8f	reader_concurrency_semaphore: add evict_inactive_reads_for_table() Allowing for evicting all inactive reads that belong to a certain table.	2022-08-15 14:16:41 +03:00
Botond Dénes	c8ef356859	test/lib: move convenience table config factory to sstable_test_env All users of `column_family_test_config()`, get the semaphore parameter for it from `sstable_test_env`. It is clear that the latter serves as the storage space for stable objects required by the table config. This patch just enshrines this fact by moving the config factory method to `sstable_test_env`, so it can just get what it needs from members.	2022-08-15 11:23:59 +03:00
Botond Dénes	c0e017e0f7	test/lib/sstable_test_env: move members to impl struct All present members of sstable_test_env are std::unique_ptr<>:s because they require stable addresses. This makes their handling somewhat awkward. Move all of them into an internal `struct impl` and make that member a unique ptr.	2022-08-15 11:20:09 +03:00
Botond Dénes	a9f296ed47	test/lib/sstable_utils: use test_env::do_with_async() Instead of manually instantiating test_env.	2022-08-15 11:19:27 +03:00
Botond Dénes	a9573b84c5	Merge 'commitlog: Revert/modify `fac2bc4` - do footprint add in delete' from Calle Wilund Fixes #11184 Fixes #11237 In prev (broken) fix for https://github.com/scylladb/scylladb/issues/11184 we added the footprint for left-over files (replay candidates) to disk footprint on commitlog init. This effectively prevents us from creating segments iff we have tight limits. Since we nowadays do quite a bit of inserts _before_ commitlog replay (system.local, but...) we can end up in a situation where we deadlock start because we cannot get to the actual replay that will eventually free things. Another, not thought through, consequence is that we add a single footprint to _all_ commitlog shard instances - even though only shard 0 will get to actually replay + delete (i.e. drop footprint). So shards 1-X would all be either locked out or performance degraded. Simplest fix is to add the footprint in delete call instead. This will lock out segment creation until delete call is done, but this is fast. Also ensures that only replay shard is involved. To further emphasize this, don't store segments found on init scan in all shard instances, instead retrieve (based on low time-pos for current gen) when required. This changes very little, but we at last don't store pointless string lists in shards 1 to X, and also we can potentially ask for the list twice. More to the point, goes better hand-in-hand with the semantics of "delete_segments", where any file sent in is considered candidate for recycling, and included in footprint. Closes #11251 * github.com:scylladb/scylladb: commitlog: Make get_segments_to_replay on-demand commitlog: Revert/modify `fac2bc4` - do footprint add in delete	2022-08-15 09:10:32 +03:00
Botond Dénes	8f10413087	Merge 'doc: describe specifying workload attributes with service levels' from Anna Stuchlik Fix https://github.com/scylladb/scylladb/issues/11197 This PR adds a new page where specifying workload attributes with service levels is described and adds it to the menu. Also, I had to fix some links because of the warnings. Closes #11209 * github.com:scylladb/scylladb: doc: remove the reduntant space from index doc: update the syntax for defining service level attributes doc: rewording doc: update the links to fix the warnings doc: add the new page to the toctree doc: add the descrption of specifying workload attributes with service levels doc: add the definition of workloads to the glossary	2022-08-15 07:14:28 +03:00
Nadav Har'El	c8b5c3595e	Merge 'cql3: select_statement: coroutinize indexed_table_select_statement::do_execute_base_query()' from Avi Kivity Increase readability in preparation for managing topology with effective_replication_map (continuing `69aea59d9`). Closes #11290 * github.com:scylladb/scylladb: cql3: select_statement: improve loop termination condition in indexed_table_select_statement::do_execute_base_query() cql3: select_statement: reindent indexed_table_select_statement::do_execute_base_query() cql3: select_statement: coroutinize indexed_table_select_statement::do_execute_base_query() cql3: select_statement: de-result_wrap indexed_table_select_statement::do_execute_base_query()	2022-08-14 23:26:06 +03:00
Nadav Har'El	4a4231ea53	Merge 'storage_proxy: coroutinize some counter mutate functions' from Avi Kivity In preparation for effective_replication_map hygiene, convert some counter functions to coroutines to simplify the changes. Closes #11291 * github.com:scylladb/scylladb: storage_proxy: mutate_counters_on_leader: coroutinize storage_proxy: mutate_counters: coroutinize storage_proxy: mutate_counters: reorganize error handling	2022-08-14 23:16:42 +03:00
Avi Kivity	8070cdbbf9	storage_proxy: mutate_counters_on_leader: coroutinize Simplify ahead of refactoring for consistent effective_replication_map.	2022-08-14 17:36:58 +03:00
Avi Kivity	6e330d98d2	storage_proxy: mutate_counters: coroutinize Simplify ahead of refactoring for consistent effective_replication_map. This is probably a pessimization of the error case, but the error case will be terrible in any case unless we resultify it.	2022-08-14 17:28:46 +03:00
Avi Kivity	105b066ff7	storage_proxy: mutate_counters: reorganize error handling Move the error handling function where it's used so the code is more straightforward. Due to some std::move()s later, we must still capture the schema early.	2022-08-14 17:13:22 +03:00
Avi Kivity	fbaa280acd	cql3: select_statement: improve loop termination condition in indexed_table_select_statement::do_execute_base_query() Move the termination condition to the front of the loop so it's clear why we're looping and when we stop. It's less than perfectly clean since we widen the scope of some variables (from loop-internal to loop-carried), but IMO it's clearer.	2022-08-14 15:40:45 +03:00
Avi Kivity	60c7c11c96	cql3: select_statement: reindent indexed_table_select_statement::do_execute_base_query() Reindent after coroutinization. No functional changes.	2022-08-14 15:35:36 +03:00
Avi Kivity	492dc6879e	cql3: select_statement: coroutinize indexed_table_select_statement::do_execute_base_query() It's much easier to maintain this way. Since it uses ranges_to_vnodes, it interacts with topology and needs integration into effective_replication_map management. The patch leaves bad indentation and an infinite-looking loop in the interest of minimization, but that will be corrected later. Note, the test for `!r.has_value()` was eliminated since it was short-circuited by the test for `!rqr.has_value()` returning from the coroutine rather than propagating an error.	2022-08-14 15:31:45 +03:00
Avi Kivity	973034978c	cql3: select_statement: de-result_wrap indexed_table_select_statement::do_execute_base_query() We use result_wrap() in two places, but that makes coroutinizing the containing function a little harder, since it's composed of more lambdas. Remove the wrappers, gaining a bit of performance in the error case.	2022-08-14 15:22:18 +03:00
Kamil Braun	b4c5b79f5e	db: system_distributed_keyspace: don't call `on_internal_error` in `check_exists` The function `check_exists` checks whether a given table exists, giving an error otherwise. It previously used `on_internal_error`. `check_exists` is used in some old functions that insert CDC metadata to CDC tables. These tables are no longer used in newer Scylla versions (they were replaced with other tables with different schema), and this function is no longer called. The table definitions were removed and these tables are no longer created. They will only exists in clusters that were upgraded from old versions of Scylla (4.3) through a sequence of upgrades. If you tried to upgrade from a very old version of Scylla which had neither the old or the new tables to a modern version, say from 4.2 to 5.0, you would get `on_internal_error` from this `check_exists` function. Fortunately: 1. we don't support such upgrade paths 2. `on_internal_error` in production clusters does not crash the system, only throws. The exception would be catched, printed, and the system would run (just without CDC - until you finished upgrade and called the propoer nodetool command to fix the CDC module). Unfortunately, there is a dtest (`partitioner_tests.py`) which performs an unsupported upgrade scenario - it starts Scylla from Cassandra (!) work directories, which is like upgrading from a very old version of Scylla. This dtest was not failing due to another bug which masked the problem. When we try to fix the bug - see #11225 - the dtest starts hitting the assertion in `check_exists`. Because it's a test, we configure `on_internal_error` to crash the system. The point of this commit is to not crash the system in this rare scenario which happens only in some weird tests. We now throw `std::runtime_error` instead of calling `on_internal_error`. In the dtest, we already ignore the resulting CDC error appearing in the logs (see scylladb/scylla-dtest#2804). Together with this change, we'll be able to fix the #11225 bug and pass this test. Closes #11287	2022-08-14 13:12:03 +03:00
Nadav Har'El	329068df99	test/cql-pytest: un-xfail yet another passing collection-indexing test After collection indexing has been implemented, yet another test which failed because of #2962 now passes. So remove the "xfail" marker. Refs #2962 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:52 +03:00
Nadav Har'El	f6f18b187a	secondary index: fix paging in map value indexing When indexing a map column's values, if the same value appears more than once, the same row will appear in the index more than once. We had code that removed these duplicates, but this deduplication did not work across page boundaries. We had two xfailing tests to demonstrate this bug. In this patch we fix this bug by looking at the page's start and not generating the same row again, thereby getting the same deduplication we had inside pages - now across pages. The previously-xfailing tests now pass, and their xfail tag is removed. I also added another test, for the case where the base table has only partition keys without clustering keys. This second test is important because the code path for the partition-key-only case is different, and the second test exposed a bug in it as well (which is also fixed in this patch). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:52 +03:00
Nadav Har'El	dc445b9a73	test/cql-pytest: test for paging with collection values index If a map has several keys with the same value, then the "values(m)" index must remember all of them as matching the same row - because later we may remove one of these keys from the map but the row would still need to match the value because of the remaining keys. We already had a test (test_index_map_values) that although the same row appears more than once for this value, when we search for this value the result only returns the row once. Under the hood, Scylla does find the same value multiple times, but then eliminates the duplicate matched raw and returns it only once. But there is a complication, that this de-duplication does not easily span paging. So in this patch we add a test that checks that paging does not cause the same row to be returned more than once. Unfortunately, this test currently fails on Scylla so marked "xfail". It passes on Cassandra. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:52 +03:00
Nadav Har'El	5d556115a1	cql, view: rename and explain bytes_with_action The structure "bytes_with_action" was very hard to understand because of its mysterious and general-sounding name, and no comments. In this patch I add a large comment explaining its purpose, and rename it to a more suitable name, view_key_and_action, which suggests that each such object is about one view key (where to add a view row), and an additional "action" that we need to take beyond adding the view row. This is the best I can do to make this code easier to understand without completely reorganizing it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:52 +03:00
Nadav Har'El	8b00c91c13	cql, index: make collection indexing a cluster feature Prevent a user from creating a secondary index on a collection column if the cluster has any nodes which don't support this feature. Such nodes will not be able to correctly handle requests related to this index, so better not allow creating one. Attempting to create an index on a collection before the entire cluster supports this feature will result in the error: Indexing of collection columns not supported by some older nodes in this cluster. Please upgrade them. Tested by manually disabling this feature in feature_service.cc and seeing this error message during collection indexing test. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:52 +03:00
Nadav Har'El	aa86f808a6	test/cql-pytest: failing tests for oversized key values in MV and SI In issue #9013, we noticed that if a value larger than 64 KB is indexed, the write fails in a bad way, and we fixed it. But the test we wrote when fixing that issue already suggested that something was still wrong: Cassandra failed the write cleanly, with an InvalidRequest, while Scylla failed with a mysterious WriteFailure (with a relevant error message only in the log). This patch adds several xfailing tests which demonstrate what's still wrong. This is also summarized in issue #8627: 1. A write of an oversized value to an indexed column returns the wrong error message. 2. The same problem also exists when indexing a collection, and the indexed key or value is oversized. 3. The situation is even less pleasant when adding an index to a table with pre-existing data and an oversized value. In this case, the view building will fail on the bad row, and never finish. 4. We have exactly the same bugs not just with indexes but also with materialized views. Interestingly, Cassandra has similar bugs in materialized views as well (but not in the secondary index case, where Cassandra does behave as expected). Refs #8627. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:52 +03:00
Nadav Har'El	2c244c6e09	cql: fix secondary index "target" when column name has special characters Unfortunately, we encode the "target" of a secondary index in one of three ways: 1. It can be just a column name 2. It can be a string like keys(colname) - for the new type of collection indexes introduced in this series. 3. It can be a JSON map ({ ... }). This form is used for local indexes. The code parsing this target - target_parser::parse() - needs not to confuse these different formats. Before this patch, if the column name contains special characters like braces or parentheses (this is allowed in CQL syntax, via quoting), we can confuse case 1, 2, and 3: A column named "keys(colname)" will be confused for case 2, and a column named "{123}" will be confused with case 3. This problem can break indexing of some specially-crafted column names - as reproduced by test_secondary_index.py::test_index_quoted_names. The solution adopted in this patch is that the column name in case 1 should be escaped somehow so it cannot be possibly confused with either cases 2 and 3. The way we chose is to convert the column name to CQL (with column_definition::as_cql_name()). In other words, if the column name contains non-alphanumeric characters, it is wrapped in quotes and also quotes are doubled, as in CQL. The result of this can't be confused with case 2 or 3, neither of which may begin with a quote. This escaping is not the minimal we could have done, but incidentally it is exactly what Cassandra does as well, so I used it as well. This change is mostly backward compatible: Already-existing indexes will still have unescaped column names stored for their "target" string, and the unescaping code will see they are not wrapped in quotes, and not change them. Backward compatibility will only fail on existing indexes on columns whose name begin and end in the quote characters - but this case is extremely unlikely. This patch illustrates how un-ideal our index "target" encoding is, but isn't what made it un-ideal. We should not have used three different formats for the index target - the third representation (JSON) should have sufficed. However, two two other representations are identical to Cassandra's, so using them when we can has its compatibility advantages. The patch makes test_secondary_index.py::test_index_quoted_names pass. Fixes #10707. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:52 +03:00
Nadav Har'El	56204a3794	cql, index: improve error messages Before this patch, trying to create an index on entries(x) where x is not a map results in an error message: Cannot create index on index_keys_and_values of column x The string "index_keys_and_values" is strange - Cassandra prints the easier to understand string "entries()" - which better corresponds to what the user actually did. It turns out that this string "index_keys_and_values" comes from an elaborate set of variables and functions spanning multiple source files, used to convert our internal target_type variable into such a string. But although this code was called "index_option" and sounded very important, it was actually used just for one thing - error messages! So in this patch we drop the entire "index_option" abstraction, replacing it by a static trivial function defined exactly where it's used (create_index_statement.cc), which prints a target type. While at it, we print "entries()" instead of "index_keys_and_values" ;-) After this patch, the test_secondary_index.py::test_index_collection_wrong_type finally passes (the previous patch fixed the default table names it assumes, and this patch fixes the expected error messages), so its "xfail" tag is removed. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:52 +03:00
Nadav Har'El	84461f1827	cql, index: fix default index name for collection index When creating an index "CREATE INDEX ON tbl(keys(m))", the default name of the index should be tbl_m_idx - with just "m". The current code incorrectly used the default name tbl_m_keys_idx, so this patch adds a test (which passes on Cassandra, and after this patch also on Scylla) and fixes the default name. It turns out that the default index name was based on a mysterious index_target::as_string(), which printed the target "keys(m)" as "m_keys" without explaining why it was so. This method was actually used only in three places, and all of them wanted just the column name, without the "_keys" suffix! So in this patch we rename the mysterious as_string() to column_name(), and use this function instead. Now that the default index name uses column_name() and gets just column_name(), the correct default index name is generated, and the test passes. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:52 +03:00
Nadav Har'El	94ba03a4d6	test/cql-pytest: un-xfail several collecting indexing tests After the previous patches implemented collection indexing, several tests in test/cql-pytest/test_secondary_index.py that were marked with "xfail" started to pass - so here we remove the xfail. Only three collection indexing tests continue to xfail: test_secondary_index.py::test_index_collection_wrong_type test_secondary_index.py::test_index_quoted_names (#10707) test_secondary_index.py::test_local_secondary_index_on_collection (#10713) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:52 +03:00
Michał Radwański	2690ecd65d	test/cql-pytest/test_secondary_index: verify that local index on collection fails. Collection indexing is being tracked by #2962. Global secondary index over collection is enabled by #10123. Leave this test to track this behaviour. Related issue: #10713	2022-08-14 10:29:52 +03:00
Michał Radwański	1d852a9c7f	docs/design-notes/secondary_index: add `VALUES` to index target list A new secondary index target is being supported, which is `VALUES(v)`.	2022-08-14 10:29:52 +03:00
Michał Radwański	25f4c905f5	test/cql-pytest/test_secondary_index: add randomized test for indexes on collections	2022-08-14 10:29:52 +03:00
Michał Radwański	2a8289c101	cql-pytest/cassandra_tests/.../secondary_index_test: fix error message in test ported from Cassandra	2022-08-14 10:29:52 +03:00
Michał Radwański	fb476702a7	cql-pytest/cassandra_tests/.../secondary_index_on_map_entries,select_test: test ported from Cassandra is expected to fail, since Scylla assumes that comparison with null doesn't throw error, just evaluates to false. Since it's not a bug, but expected behavior from the perspective of Scylla, we don't mark it as xfail.	2022-08-14 10:29:52 +03:00
Michał Radwański	f572051ee9	test/boost/secondary_index_test: update for non-frozen indexes on collections	2022-08-14 10:29:52 +03:00
Karol Baryła	9e377b2824	test/cql-pytest: Uncomment collection indexes tests that should be working now	2022-08-14 10:29:52 +03:00
Nadav Har'El	67990d2170	cql, index: don't use IS NOT NULL on collection column When the secondary-index code builds a materialized view on column x, it adds "x IS NOT NULL" to the where-clause of the view, as required. However, when we index a collection column, we index individual pieces of the collection (keys, values), the the entire collection, so checking if the entire collection is null does not make sense. Moreover, for a collection column x, "x IS NOT NULL" currently doesn't work and throws errors when evaluating that expression when data is written to the table. The solution used in this patch is to simply avoid adding the "x IS NOT NULL" when creating the materialized view for a collection index. Everything works just fine without it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:52 +03:00
Michał Radwański	bd44bc3e35	cql3/statements/select_statement: for index on values of collection, don't emit duplicate rows The index on collection values is special in a way, as its' clustering key contains not only the base primary key, but also a column that holds the keys of the cells in the collection, which allows to distinguish cells with different keys but the same value. This has an unwanted consequence, that it's possible to receive two identical base table primary keys from indexed_table_select_statement::find_index_clustering_rows. Thankfully, the duplicate primary keys are guaranteed to occur consequently.	2022-08-14 10:29:52 +03:00
Michał Radwański	10e241988e	cql/expr/expression, index/secondary_index_manager: needs_filtering and index_supports_expression rewrite to accomodate for indexes over collections	2022-08-14 10:29:52 +03:00
Karol Baryła	ac97086855	cql3, index: Use entries() indexes on collections for queries Previous commit added the ability to use GSI over non-frozen collections in queries, but only the keys() and values() indexes. This commit adds support for the missing index type - entries() index. Signed-off-by: Karol Baryła <karol.baryla@scylladb.com> Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:52 +03:00
Karol Baryła	7966841d37	cql3, index: Use keys() and values() indexes on collections for queries. Previous commits added the possibility of creating GSI on non-frozen collections. This (and next) commit allow those indexes to actually be used by queries. This commit enables both keys() and values() indexes, as they are pretty similar.	2022-08-14 10:29:52 +03:00
Karol Baryła	aa47f4a15c	types/tuple: Use std::begin() instead of .begin() in tuple_type_impl::build_value_fragmented std::begin in concept for build_value_fragmented's parameter allows creating it from an array	2022-08-14 10:29:52 +03:00
Michał Radwański	e6521ff8ba	cql3/statements/index_target: throw exception to signalize that we didn't miss returning from function GCC doesn't consider switches over enums to be exhaustive. Replace bogous return value after a switch where each of the cases return, with an exception.	2022-08-14 10:29:52 +03:00
Michał Radwański	32289d681f	db/view/view.cc: compute view_updates for views over collections For collection indexes, logic of computing values for each of the column needed to change, since a single particular column might produce more than one value as a result. The liveness info from individual cells of the collection impacts the liveness info of resulting rows. Therefore it is needed to rewrite the control flow - instead of functions getting a row from get_view_row and later computing row markers and applying it, they compute these values by themselves. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:49 +03:00
Michał Radwański	112086767c	view info: has_computed_column_depending_on_base_non_primary_key In case of secondary indexes, if an index does not contain any column from the base which makes up for the primary key, then it is assumed that during update, a change to some cells from the base table cannot cause that we're dealing with a different row in the view. This however doesn't take into account the possibility of computed columns which in fact do depend on some non-primary-key columns. Introduce additional property of an index, has_computed_column_depending_on_base_non_primary_key.	2022-08-14 10:29:14 +03:00
Michał Radwański	4cfd264e5d	column_computation: depends_on_non_primary_key_column depends_on_non_primary_key_column for a column computation is needed to detect a case where the primary key of a materialized view depends on a non primary key column from the base table, but at the same time, the view itself doesn't have non-primary key columns. This is an issue, since as for now, it was assumed that no non-primary key columns in view schema meant that the update cannot change the primary key of the view, and therefore the update path can be simplified.	2022-08-14 10:29:14 +03:00
Michał Radwański	f1a9def2e1	schema, index/secondary_index_manager: make schema for index-induced mv Indexes over collections use materialized views. Supposing that we're dealing with global indexes, and that pk, ck were the partition and clustering keys of the base table, the schema of the materialized view, apart from having idx_token (which is used to preserve the order on the entries in the view), has a computed column coll_value (the name is not guaranteed to be exactly) and potentially also coll_keys_for_values_index, if the index was over collection values. This is needed, since values in a specific collection need not be unique. To summarize, the primary key is as follows: coll_value, idx_token, pk, ck, coll_keys_for_values_index? where coll_value is the computed value from the collection, be it a key from the collection, a value from the collection, or the tuple containing both.	2022-08-14 10:29:14 +03:00
Michał Radwański	60d50f6016	index/secondary_index_manager: extract keys, values, entries types from collection These functions are relevant for indexes over collections (creating schema for a materialized view related to the index). Signed-off-by: Michał Radwański <michal.radwanski@scylladb.com> Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-08-14 10:29:14 +03:00
Michał Radwański	cbe33f8d7a	cql3/statements/: validate CREATE INDEX for index over a collection Allow CQL like this: CREATE INDEX idx ON table(some_map); CREATE INDEX idx ON table(KEYS(some_map)); CREATE INDEX idx ON table(VALUES(some_map)); CREATE INDEX idx ON table(ENTRIES(some_map)); CREATE INDEX idx ON table(some_set); CREATE INDEX idx ON table(VALUES(some_set)); CREATE INDEX idx ON table(some_list); CREATE INDEX idx ON table(VALUES(some_list)); This is needed to support creating indexes on collections.	2022-08-14 10:29:13 +03:00
Michał Radwański	997682ed72	cql3/statements/create_index_statement,index_target: rewrite index target for collection The syntax used for creating indexes on collections that is present in Cassandra is unintuitive from the internal representation point of view. For instance, index on VALUES(some_set) indexes the set elements, which in the internal representation are keys of collection. Rewrite the index target after receiving it, so that the index targets are consistent with the representation.	2022-08-14 10:29:13 +03:00
Michał Radwański	ebc4ad4713	column_computation.hh, schema.cc: collection_column_computation This type of column computation will be used for creating updates to materialized views that are indexes over collections. This type features additional function, compute_values_with_action, which depending on an (optional) old row and new row (the update to the base table) returns multiple bytes_with_action, a vector of pairs (computed value, some action), where the action signifies whether a deletion of row with a specific key is needed, or creation thereby.	2022-08-14 10:29:13 +03:00
Michał Radwański	2babee2cdc	column_computation.hh, schema.cc: compute_value interface refactor The compute_value function of column_computation has had previously the following signature: virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const override; This is superfluous, since never in the history of Scylla, the last parameter (row) was used in any implentation, and never did it happen that it returned bytes_opt. The absurdity of this interface can be seen especially when looking at call sites like following, where dummy empty row was created: ``` token_column.get_computation().compute_value( *_schema, pkv_linearized, clustering_row(clustering_key_prefix::make_empty())); ```	2022-08-14 10:29:13 +03:00
Michał Radwański	166afd46b5	Cql.g, treewide: support cql syntax `INDEX ON table(VALUES(collection))` Brings support of cql syntax `INDEX ON table(VALUES(collection))`, even though there is still no support for indexes over collections. Previously, index_target::target_type::values was refering to values of a regular (non-collection) column. Rename it to `regular_values`. Fixes #8745.	2022-08-14 10:29:13 +03:00
Piotr Sarna	fe617ed198	Merge 'db/system_keyspace: in system.local, use broadcast_rpc_address in rpc_address column' from Piotr Dulikowski Previously, the `system.local`'s `rpc_address` column kept local node's `rpc_address` from the scylla.yaml configuration. Although it sounds like it makes sense, there are a few reasons to change it to the value of scylla.yaml's `broadcast_rpc_address`: - The `broadcast_rpc_address` is the address that the drivers are supposed to connect to. `rpc_address` is the address that the node binds to - it can be set for example to 0.0.0.0 so that Scylla listens on all addresses, however this gives no useful information to the driver. - The `system.peers` table also has the `rpc_address` column and it already keeps other nodes' `broadcast_rpc_address`es. - Cassandra is going to do the same change in the upcoming version 4.1. Fixes: #11201 Closes #11204 * github.com:scylladb/scylladb: db/system_keyspace: fix indentation after previous patch db/system_keyspace: in system.local, use broadcast_rpc_address in rpc_address column	2022-08-12 16:24:28 +02:00
Anna Stuchlik	41362829b5	doc: fix the upgrade guides for Ubuntu and Debian by removing image-related information	2022-08-12 14:39:10 +02:00
Anna Stuchlik	b45ba69a6c	doc: update the guides for Ubuntu and Debian to remove image information and the OS version number	2022-08-12 14:05:49 +02:00
Anna Stuchlik	24acffc2ce	doc: add the upgrade guide for ScyllaDB image from 2021.1 to 2022.1	2022-08-12 13:47:03 +02:00
Piotr Sarna	1ab4c6aab3	Merge 'cql3: enable collections as UDA accumulators' from Wojciech Mitros Currently, the initial values of UDA accumulators are converted to strings using the to_string() method and from strings using the from_string() method. The from_string() method is not implemented for collections, and it can't be implemented without changing the string format, because in that format, we cannot differentiate whether a separator is a part of a value or is an actual separator between values. In particular, the separators are not escaped in the collection values. Instead of from_string()/to_string() the cql parser is used for creating a value from a string (the same , and to_parsable_string() is used to converting a value into a string. A test using a list as an accumulator is added to cql-pytest/test_uda.py. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com> Closes #11250 * github.com:scylladb/scylladb: cql3: enable collections as UDA accumulators cql3: extend implementation of to_bytes for raw_value	2022-08-12 12:51:17 +02:00
Botond Dénes	ceb1cdcb7a	Merge 'doc: fix the typo on the Fault Tolerance page' from Anna Stuchlik Fix https://github.com/scylladb/scylla-doc-issues/issues/438 In addition, I've replaced "Scylla" with "ScyllaDB" on that page. Closes #11281 * github.com:scylladb/scylladb: doc: replace Scylla with ScyllaDB on the Fault Tolerance page doc: fis the typo in the note	2022-08-12 06:58:39 +03:00
Nadav Har'El	c27f431580	test/alternator: fix a flaky test for full-table scan page size This patch fixes the test test_scan.py::test_scan_paging_missing_limit which failed in a Jenkins run once (that we know of). That test verifies that an Alternator Scan operation without an explicit "Limit" is nevertheless paged: DynamoDB (and also Scylla) wanted this page size to be 1 MB, but it turns out (see #10327) that because of the details of how Scylla's scan works, the page size can be larger than 1 MB. How much larger? I ran this test hundreds of times and never saw it exceed a 3 MB page - so the test asserted the page must be smaller than 4 MB. But now in one run - we got to this 4 MB and failed the test. So in this patch we increase the table to be scanned from 4 MB to 6 MB, and assert the page size isn't the full 6 MB. The chance that this size will eventually fail as well should be (famous last words...) very small for two reasons: First because 6 MB is even higher than I the maximum I saw in practice, and second because empirically I noticed that adding more data to the table reduces the variance of the page size, so it should become closer to 1 MB and reduce the chance of it reaching 6 MB. Refs #10327 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11280	2022-08-12 06:57:45 +03:00
Botond Dénes	2a39d6518d	Merge 'doc: clarify the disclaimer about reusing deleted counter column values' from Anna Stuchlik Fix https://github.com/scylladb/scylla-doc-issues/issues/857 Closes #11253 * github.com:scylladb/scylladb: doc: language improvemens to the Counrers page doc: fix the external link doc: clarify the disclaimer about reusing deleted counter column values	2022-08-12 06:56:28 +03:00
Botond Dénes	10371441c9	Merge 'docs: add a disclaimer about not supporting local counters by SSTableLoader' from Anna Stuchlik Fix https://github.com/scylladb/scylla-doc-issues/issues/867 Plus some language, formatting, and organization improvements. Closes #11248 * github.com:scylladb/scylladb: doc: language, formatting, and organization improvements doc: add a disclaimer about not supporting local counters by SSTableLoader	2022-08-12 06:55:00 +03:00
Benny Halevy	d295d8e280	everywhere: define locator::host_id as a strong tagged_uuid type So it can be distinguished from other uuid-based identifiers in the system. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11276	2022-08-12 06:01:44 +03:00
Botond Dénes	69aea59d97	Merge 'storage_proxy: use consistent topology, prepare for fencing' from Avi Kivity Replication is a mix of several inputs: tokens and token->node mappings (topology), the replication strategy, replication strategy parameters. These are all captured in effective_replication_map. However, if we use effective_replication_map:s captured at different times in a single query, then different uses may see different inputs to effective_replication_map. This series protects against that by capturing an effective_replication_map just once in a query, and then using it. Furthermore, the captured effective_replication_map is held until the query completes, so topology code can know when a topology is no longer is use (although this isn't exploited in this series). Only the simple read and write paths are covered. Counters and paxos are left for later. I don't think the series fixes any bugs - as far as I could tell everything was happening in the same continuation. But this series ensures it. Closes #11259 * github.com:scylladb/scylladb: storage_proxy: use consistent topology storage_proxy: use consistent replication map on read path storage_proxy: use consistent replication map on write path storage_proxy: convert get_live{,_sorted}_endpoints() to accept an effective_replication_map consistency_level: accept effective_replication_map as parameter, rather than keyspace consistency_level: be more const when using replication_strategy	2022-08-12 06:00:30 +03:00
Alejo Sanchez	10baac1c84	test.py: test topology and schema changes Add support for topology changes: add/stop/remove/restart/replace node. Test simple schema changes when changing topology. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-11 23:39:13 +02:00
Alejo Sanchez	7f32fc0cc7	test.py: ClusterManager API mark cluster dirty Allow tests to manually mark current cluster dirty. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-11 23:39:13 +02:00
Alejo Sanchez	a585a82ad1	test.py: call before/after_test for each test case Preparing for topology tests with changing clusters, run before and after checks per test case. Change scope of pytest fixtures to function as we need them per test casse. Add server and client API logic. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-11 23:39:13 +02:00
Alejo Sanchez	eedc866433	test.py: handle driver connection in ManagerClient Preparing for cluster cycling, handle driver connection in ManagerClient. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-11 23:39:13 +02:00
Alejo Sanchez	fe561a7dbd	test.py: ClusterManager API and ManagerClient Add an API via Unix socket to Manager so pytests can query information about the cluster. Requests are managed by ManagerClient helper class. The socket is placed inside a unique temporary directory for the Manager (as safe temporary socket filename is not possible in Python). Initial API services are manager up, cluster up, if cluster is dirty, cql port, configured replicas (RF), and list of host ids. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-11 23:39:13 +02:00
Alejo Sanchez	aad015d4e2	test.py: improve topology docstring Improve docstring of TopologyTestSuite to reflect its differences with other test suites. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-11 23:39:13 +02:00
Avi Kivity	a2c4f5aa1a	storage_proxy: use consistent topology Derive the topology from captured and stable effective_replication_map instead of getting a fresh topology from storage_proxy, since the fresh topology may be inconsistent with the running query. digest_read_resolver did not capture an effective_replication_map, so that is added.	2022-08-11 17:58:42 +03:00
Avi Kivity	883518697b	storage_proxy: use consistent replication map on read path Capture a replication map just once in abstract_read_executor::_effective_replication_map_ptr. Although it isn't used yet, it serves to keep a reference count on topology (for fencing), and some accesses to topology within reads still remain, which can be converted to use the member in a later patch.	2022-08-11 17:58:42 +03:00
Avi Kivity	01a614fb4d	storage_proxy: use consistent replication map on write path Capture a replication map just once in abstract_write_handler::_effective_replication_map_ptr and use it in all write handlers. A few accesses to get the topology still remain, they will be fixed up in a later patch.	2022-08-11 17:58:42 +03:00
Avi Kivity	f1b0e3d58e	storage_proxy: convert get_live{,_sorted}_endpoints() to accept an effective_replication_map Allow callers to use consistent effective_replication_map:s across calls by letting the caller select the object to use.	2022-08-11 17:58:42 +03:00
Avi Kivity	46bd0b1e62	consistency_level: accept effective_replication_map as parameter, rather than keyspace A keyspace is a mutable object that can change from time to time. An effective_replication_map captures the state of a keyspace at a point in time and can therefore be consistent (with care from the caller). Change consistency_level's functions to accept an effective_replication_map. This allows the caller to ensure that separate calls use the same information and are consistent with each other. Current callers are likely correct since they are called from one continuation, but it's better to be sure.	2022-08-11 17:58:42 +03:00
Avi Kivity	1078d1bfda	consistency_level: be more const when using replication_strategy We don't modify the replication_strategy here, so use const. This will help when the object we get is const itself, as it will be in the next patches.	2022-08-11 17:58:42 +03:00
Wojciech Mitros	48bd752971	cql3: enable collections as UDA accumulators Currently, the initial values of UDA accumulators are converted to strings using the to_string() method and from strings using the from_string() method. The from_string() method is not implemented for collections, and it can't be implemented without changing the string format, because in that format, we cannot differentiate whether a separator is a part of a value or is an actual separator between values. In particular, the separators are not escaped in the collection values. For example, a list with string elements: 'a, b', 'c' would be represented as a string 'a, b, c', while now it is represented as "['a, b', 'c']". Some types that were parsable are now represented in a different way. For example, a tuple ('a', null, 0) was represented as "a:\@:0", and now it is "('a', null, 0)". Instead of from_string()/to_string() the cql parser is used for creating a value from a string (the same , and to_parsable_string() is used to converting a value into a string. A test using a list as an accumulator is added to cql-pytest/test_uda.py. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2022-08-11 16:23:57 +02:00
Anna Stuchlik	f5a49688ae	doc: replace Scylla with ScyllaDB on the Fault Tolerance page	2022-08-11 16:14:33 +02:00
Anna Stuchlik	7218a977df	doc: fis the typo in the note	2022-08-11 16:09:49 +02:00
Botond Dénes	d407d3b480	Merge 'Calculate effective_replication_map: prevent stalls with everywhere_replication_strategy' from Benny Halevy For replication strategies like "everywhere" and "local" that return the same set of endpoints for all tokens, we can call rs->calculate_natural_endpoints one once and reuse the result for all token. Note that ideally the replication_map could contain only a single token range for this case, but that does't seem to work yet. Add `maybe_yield()` calls to the tight loop to prevent reactor stalls on large clusters when copying a long vector returned by everywhere_replication_strategy to potentially 1000's of tokens in the map. Nicholas Peshek wrote in https://github.com/scylladb/scylladb/issues/10337#issuecomment-1211152370 about similar patch by Geoffrey Beausire: `994c6ecf3c` > Yep. That dropped our startup from 3000+ seconds to about 40. Fixes #10337 Closes #11277 * github.com:scylladb/scylladb: abstract_replication_strategy: calculate_effective_replication_map: optimize for static replication strategies abstract_replication_strategy: add has_uniform_natural_endpoints	2022-08-11 15:19:47 +03:00
Gleb Natapov	e5157b27ad	raft: getting abort_requested_exception exception from a sm::apply is not a critical error During shutdown it is normal to get abort_requested_exception exception from a state machine "apply" method. Do not rethrow it as state_machine_error, just abort an applier loop with an info message.	2022-08-11 15:11:21 +03:00
Gleb Natapov	9977851eb1	schema_registry: fix abandoned feature warning maybe_sync ignores failed feature in case waiting is aborted. Fix it.	2022-08-11 15:11:21 +03:00
Gleb Natapov	eed8e19813	service: raft: silence rpc::closed_errors in raft_rpc Before the patch if an RPC connection was established already then the close error was reported by the RPC layer and then duplicated by raft_rpc layer. If a connection cannot be established because the remote node is already dead RPC does not report the error since we decided that in that case gossiper and failure detector messages can be used to detect the dead node case and there is no reason to pollute the logs with recurring errors. This aligns raft behaviour with what we already have in storage_proxy that does not report closed errors as well.	2022-08-11 15:11:21 +03:00
Anna Stuchlik	1603129275	doc: remove the reduntant space from index	2022-08-11 12:36:16 +02:00
Anna Stuchlik	ee258cb0af	doc: update the syntax for defining service level attributes	2022-08-11 12:32:38 +02:00
Petr Gusev	4bc6611829	raft read_barrier, retry over intermittent rpc failures If the leader was unavailable during read_barrier, closed_error occurs, which was not handled in any way and eventually reached the client. This patch adds retries in this case. Fix: scylladb#11262 Refs: #11278 Closes #11263	2022-08-11 13:31:19 +03:00
Amnon Heiman	5ac20ac861	Reduce the number of per-scheduling group metrics This patch reduces the number of metrics ScyllaDB generates. Motivation: The combination of per-shard with per-scheduling group generates a lot of metrics. When combined with histograms, which require many metrics, the problem becomes even bigger. The two tools we are going to use: 1. Replace per-shard histograms with summaries 2. Do not report unused metrics. The storage_proxy stats holds information for the API and the metrics layer. We replaced timed_rate_moving_average_and_histogram and time_estimated_histogram with the unfied timed_rate_moving_average_summary_and_histogram which give us an option to report per-shard summaries instead of histogram. All the counters, histograms, and summaries were marked as skip_when_empty. The API was modified to use timed_rate_moving_average_summary_and_histogram. Closes #11173	2022-08-11 13:31:19 +03:00
Benny Halevy	9167b857e9	abstract_replication_strategy: calculate_effective_replication_map: optimize for static replication strategies For replication strategies like "everywhere" and "local" that return the same set of endpoints for all tokens, we can call rs->calculate_natural_endpoints one once and reuse the result for all token. Note that ideally the replication_map could contain only a single token range for this case, but that does't seem to work yet. Add maybe_yield() calls to the tight loop to prevent reactor stalls on large clusters when copying a long vector returned by everywhere_replication_strategy to potentially 1000's of tokens in the map. Nicholas Peshek wrote in https://github.com/scylladb/scylladb/issues/10337#issuecomment-1211152370 about similar patch by Geoffrey Beausire: `994c6ecf3c` > Yep. That dropped our startup from 3000+ seconds to about 40. Fixes #10337 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-11 10:35:29 +03:00
Benny Halevy	eb678e723b	abstract_replication_strategy: add has_uniform_natural_endpoints So that using calaculate_natural_endpoints can be optimized for strategies that return the same endpoints for all tokens, namely everywhere_replication_strategy and local_strategy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-11 10:34:14 +03:00
Calle Wilund	a729c2438e	commitlog: Make get_segments_to_replay on-demand Refs #11237 Don't store segments found on init scan in all shard instances, instead retrieve (based on low time-pos for current gen) when required. This changes very little, but we at last don't store pointless string lists in shards 1 to X, and also we can potentially ask for the list twice. More to the point, goes better hand-in-hand with the semantics of "delete_segments", where any file sent in is considered candidate for recycling, and included in footprint.	2022-08-11 06:41:23 +00:00
Nadav Har'El	d03bd82222	Revert "test: move scylla_inject_error from alternator/ to cql-pytest/" This reverts commit `8e892426e2` and fixes the code in a different way: That commit moved the scylla_inject_error function from test/alternator/util.py to test/cql-pytest/util.py and renamed test/alternator/util.py. I found the rename confusing and unnecessary. Moreover, the moved function isn't even usable today by the test suite that includes it, cql-pytest, because it lacks the "rest_api" fixture :-) so test/cql-pytest/util.py wasn't the right place for it anyway. test/rest_api/rest_util.py could have been a good place for this function, but there is another complication: Although the Alternator and rest_api tests both had a "rest_api" fixture, it has a different type, which led to the code in rest_api which used the moved function to have to jump through hoops to call it instead of just passing "rest_api". I think the best solution is to revert the above commit, and duplicate the short scylla_inject_error() function. The duplication isn't an exact copy - the test/rest_api/rest_util.py version now accepts the "rest_api" fixture instead of the URL that the Alternator version used. In the future we can remove some of this duplication by having some shared "library" code but we should do it carefully and starting with agreeing on the basic fixtures like "rest_api" and "cql", without that it's not useful to share small functions that operate on them. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11275	2022-08-11 06:43:26 +03:00
Wojciech Mitros	42e0fb90ea	cql3: extend implementation of to_bytes for raw_value When called with a null_value or an unset_value, raw_value::to_bytes() threw an std::get error for wrong variant. This patch adds a description for the errors thrown, and adds a to_bytes_opt() method that instead of throwing returns a std::nullopt.	2022-08-10 16:40:22 +02:00
Avi Kivity	e9cbc9ee85	Merge 'Add support for empty replica pages' from Botond Dénes Many tombstones in a partition is a problem that has been plaguing queries since the inception of Scylla (and even before that as they are a pain in Apache Cassandra too). Tombstones don't count towards the query's page limit, neither the size nor the row number one. Hence, large spans of tombstones (be that row- or range-tombstones) are problematic: the query can time out while processing this span of tombstones, as it waits for more live rows to fill the page. In the extreme case a partition becomes entirely unreadable, all read attempts timing out, until compaction manages to purge the tombstones. The solution proposed in this PR is to pass down a tombstone limit to replicas: when this limit is reached, the replica cuts the page and marks it as short one, even if the page is empty currently. To make this work, we use the last-position infrastructure added recently by `3131cbea62`, so that replicas can provide the position of the last processed item to continue the next page from. Without this no forward progress could be made in the case of an empty page: the query would continue from the same position on the next page, having to process the same span of tombstones. The limit can be configured with the newly added `query_tombstone_limit` configuration item, defaulted to 10000. The coordinator will pass this to the newly added `tombstone_limit` field of `read_command`, if the `replica_empty_pages` cluster feature is set. Upgrade sanity test was conducted as following: * Created cluster of 3 nodes with RF=3 with master version * Wrote small dataset of 1000 rows. * Deleted prefix of 980 rows. * Started read workload: `scylla-bench -mode=read -workload=uniform -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -duration=10m -rows-per-request=9000 -page-size=100` * Also did some manual queries via `cqlsh` with smaller page size and tracing on. * Stopped and upgraded each node one-by-one. New nodes were started by `--query-tombstone-page-limit=10`. * Confirmed there are no errors or read-repairs. Perf regression test: ``` build/release/test/perf/perf_simple_query_g -c1 -m2G --concurrency=1000 --task-quota-ms 10 --duration=60 ``` Before: ``` median 133665.96 tps ( 62.0 allocs/op, 12.0 tasks/op, 43007 insns/op, 0 errors) median absolute deviation: 973.40 maximum: 135511.63 minimum: 104978.74 ``` After: ``` median 129984.90 tps ( 62.0 allocs/op, 12.0 tasks/op, 43181 insns/op, 0 errors) median absolute deviation: 2979.13 maximum: 134538.13 minimum: 114688.07 ``` Diff: +~200 instruction/op. Fixes: https://github.com/scylladb/scylla/issues/7689 Fixes: https://github.com/scylladb/scylla/issues/3914 Fixes: https://github.com/scylladb/scylla/issues/7933 Refs: https://github.com/scylladb/scylla/issues/3672 Closes #11053 * github.com:scylladb/scylladb: test/cql-pytest: add test for query tombstone page limit query-result-writer: stop when tombstone-limit is reached service/pager: prepare for empty pages service/storage_proxy: set smallest continue pos as query's continue pos service/storage_proxy: propagate last position on digest reads query: result_merger::get() don't reset last-pos on short-reads and last pages query: add tombstone-limit to read-command service/storage_proxy: add get_tombstone_limit() query: add tombstone_limit type db/config: add config item for query tombstone limit gms: add cluster feature for empty replica pages tree: don't use query::read_command's IDL constructor	2022-08-10 13:38:06 +03:00
Avi Kivity	32b405d639	Merge 'doc: change the default for the overprovisioned option' from Anna Stuchlik Fix https://github.com/scylladb/scylla-doc-issues/issues/842 This PR changes the default for the `overprovisioned` option from `disabled` to `enabled`, according to https://github.com/scylladb/scylla-doc-issues/issues/842. In addition, I've used this opportunity to replace "Scylla" with "ScyllaDB" on the updated page. Closes #11256 * github.com:scylladb/scylladb: doc: replace Scylla with ScyllaDB in the product name doc: change the default for the overprovisioned option	2022-08-10 12:43:44 +03:00
Raphael S. Carvalho	ace6334619	replica: table: kill unused _sstables_staging Good change as it's one less thing to worry about in compaction group. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-08-10 12:32:13 +03:00
Kamil Braun	cff595211e	Merge 'Raft test topology part 2' from Alecco Give cluster control to pytests. While there add missing stop gracefully and add server to ScyllaCluster. Clusters can be marked dirty but they are not recycled yet. This will be done in a later series. Closes #11219 * github.com:scylladb/scylladb: test.py: ScyllaCluster add_server() mark dirty test.py: ScyllaCluster add server management test.py: improve seeds for new servers test.py: Topology tests and Manager for Scylla clusters test.py: rename scylla_server to scylla_cluster test.py: function for python driver connection test.py: ScyllaCluster add_server helper test.py: shutdown control connection during graceful shutdown test.py: configurable authenticator and authorizer test.py: ScyllaServer stop gracefully test.py: FIXME for bad cluster log handling logic	2022-08-10 11:13:21 +02:00
Michał Chojnowski	de0f2c21ec	configure.py: make messaging_service.cc the first source file Currently messaging_service.o takes the longest of all core objects to compile. For a full build of build/release/scylla, with current ninja scheduling, on a 32-hyperthread machine, the last ~16% of the total build time is spent just waiting on messaging_service.o to finish compiling. Moving the file to the top of the list makes ninja start its compilation early and gets rid of that single-threaded tail, improving the total build time. Closes #11255	2022-08-10 11:18:09 +03:00
Calle Wilund	8116c56807	commitlog: Revert/modify `fac2bc4` - do footprint add in delete Fixes #11184 Fixes #11237 In prev (broken) fix for #11184 we added the footprint for left-over files (replay candidates) to disk footprint on commitlog init. This effectively prevents us from creating segments iff we have tight limits. Since we nowadays do quite a bit of inserts _before_ commitlog replay (system.local, but...) we can end up in a situation where we deadlock start because we cannot get to the actual replay that will eventually free things. Another, not thought through, consequence is that we add a single footprint to _all_ commitlog shard instances - even though only shard 0 will get to actually replay + delete (i.e. drop footprint). So shards 1-X would all be either locked out or performance degraded. Simplest fix is to add the footprint in delete call instead. This will lock out segment creation until delete call is done, but this is fast. Also ensures that only replay shard is involved.	2022-08-10 08:04:03 +00:00
Botond Dénes	e27127bb7f	test/cql-pytest: add test for query tombstone page limit Check that the replica returns empty pages as expected, when a large tombstone prefix/span is present. Large = larger than the configured query_tombstone_limit (using a tiny value of 10 in the test to avoid having to write many tombstones).	2022-08-10 09:14:59 +03:00
Tomasz Grabiec	8ee5b69f80	test: row_cache: Use more narrow key range to stress overlapping reads more This makes catching issues related to concurrent access of same or adjacent entries more likely. For example, catches #11239. Closes #11260	2022-08-10 06:53:54 +03:00
Botond Dénes	7730419f5c	query-result-writer: stop when tombstone-limit is reached The query result writer now counts tombstones and cuts the page (marking it as a short one) when the tombstone limit is reached. This is to avoid timing out on large span of tombstones, especially prefixes. In the case of unpaged queries, we fail the read instead, similarly to how we do with max result size. If the limit is 0, the previous behaviour is used: tombstones are not taken into consideration at all.	2022-08-10 06:03:38 +03:00
Botond Dénes	8066dbc635	service/pager: prepare for empty pages The pager currently assumes that an empty pages means the query is exhausted. Lift this assumption, as we will soon have empty short pages. Also, paging using filtering also needs to use the replica-provided last-position when the page is empty.	2022-08-10 06:03:38 +03:00
Botond Dénes	6a7dedfe34	service/storage_proxy: set smallest continue pos as query's continue pos We expect each replica to stop at exactly the same position when the digests match. Soon however, if replicas have a lot of tombstones, some may stop earlier then the others. As long as all digests match, this is fine but we need to make sure we continue from the smallest such positions on the next page.	2022-08-10 06:03:38 +03:00
Botond Dénes	2656968db2	service/storage_proxy: propagate last position on digest reads We want to transmit the last position as determined by the replica on both result and digest reads. Result reads already do that via the query::result, but digest reads don't yet as they don't return the full query::result structure, just the digest field from it. Add the last position to the digest read's return value and collect these in the digest resolver, along with the returned digests.	2022-08-10 06:03:37 +03:00
Botond Dénes	8c0dd99f7c	query: result_merger::get() don't reset last-pos on short-reads and last pages When merging multiple query-results, we use the last-position of the last result in the combined one as the combined result's last position. This only works however if said last result was included fully. Otherwise we have to discard the last-position included with the result and the pager will use the position of the last row in the combined result as the last position. The commit introducing the above logic mistakenly discarded the last position when the result is a short read or a page is not full. This is not necessary and even harmful as it can result in an empty combined result being delivered to the pager, without a last-position.	2022-08-10 06:01:49 +03:00
Botond Dénes	d1d53f1b84	query: add tombstone-limit to read-command Propagate the tombstone-limit from coordinator to replicas, to make sure all is using the same limit.	2022-08-10 06:01:47 +03:00
Anna Stuchlik	43cc17bf5d	doc: replace Scylla with ScyllaDB in the product name	2022-08-09 16:19:55 +02:00
Anna Stuchlik	d21b92fb13	doc: change the default for the overprovisioned option	2022-08-09 16:09:29 +02:00
Anna Stuchlik	c3dbb9706e	doc: language improvemens to the Counrers page	2022-08-09 14:35:44 +02:00
Alejo Sanchez	05afca2199	test.py: ScyllaCluster add_server() mark dirty When changing topology, tests will add servers. Make add_server mark the cluster dirty. But mark the cluster as not dirty after calling add_server when installing the cluster. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-09 14:26:13 +02:00
Alejo Sanchez	f1a6e4bda9	test.py: ScyllaCluster add server management Preparing for topology changes, implement the primitives for managing ScyllaServers in ScyllaCluster. The states are started, stopped, and removed. Started servers can be stopped or restarted. Stopped servers can be started. Stopped servers can be removed (destroyed). Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-09 14:26:13 +02:00
Alejo Sanchez	a6448458bb	test.py: improve seeds for new servers Instead of only using last started server as seed, use all started servers as seed for new servers. This also avoids tracking last server's state. Pass empty list instead of None. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-09 14:26:13 +02:00
Alejo Sanchez	83dab6045b	test.py: Topology tests and Manager for Scylla clusters Preparing to cycle clusters modified (dirty) and use multiple clusters per topology pytest, introduce Topology tests and Manager class to handle clusters. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-09 14:26:13 +02:00
Alejo Sanchez	14328d1e42	test.py: rename scylla_server to scylla_cluster This file's most important class is ScyllaCluster, so rename it accordingly. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-09 14:26:13 +02:00
Alejo Sanchez	dcd8d77f34	test.py: function for python driver connection Isolate python driver connection on its own function. Preparing for harness client fixture to handle the connection. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-09 14:26:13 +02:00
Alejo Sanchez	1db31ebfdc	test.py: ScyllaCluster add_server helper For future use from API. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-09 14:26:13 +02:00
Konstantin Osipov	c81c8af1ba	test.py: shutdown control connection during graceful shutdown	2022-08-09 14:26:13 +02:00
Alejo Sanchez	bc494754e8	test.py: configurable authenticator and authorizer For scylla servers, keep default PasswordAuthenticator and CassandraAuthorizer but allow this to be configurable per test suite. Use AllowAll* for topology test suite. Disabling authentication avoids complications later for topology tests as system_auth kespace starts with RF=1 and tests take down nodes. The keyspace would need to change RF and run repair. Using AllowAll avoids this problem altogether. A different cql fixture is created without auth for topology tests. Topology tests require servers without auth from scylla.yaml conf. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-09 14:26:13 +02:00
Alejo Sanchez	6437a7f467	test.py: ScyllaServer stop gracefully Add stop_gracefully() method. Terminates a server in a clean way. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-09 14:26:13 +02:00
Alejo Sanchez	573ed429ad	test.py: FIXME for bad cluster log handling logic The code in test.py using a ScyllaCluster is getting a server id and taking logs from only the first server. If there is a failure in another server it's not reported properly. And CQL connection will go only to the first server. Also, it might be better to have ScyllaCluster to handle these matters and be more opaque. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-09 14:26:13 +02:00
Anna Stuchlik	15c24ba3e0	doc: fix the external link	2022-08-09 14:20:54 +02:00
Anna Stuchlik	82d1f67378	doc: clarify the disclaimer about reusing deleted counter column values	2022-08-09 14:12:37 +02:00
Avi Kivity	be44fd63f9	Merge 'Make get_range_addresses async and hold effective_replication_map_ptr around it' from Benny Halevy This series converts the synchronous `effective_replication_map::get_range_addresses` to async by calling the replication strategy async entry point with the same name, as its callers are already async or can be made so easily. To allow it to yield and work on a coherent view of the token_metadata / topology / replication_map, let the callers of this patch hold a effective_replication_map per keyspace and pass it down to the (now asynchronous) functions that use it (making affected storage_service methods static where possible if they no longer depend on the storage_service instance). Also, the repeated calls to everywhere_replication_strategy::calculate_natural_endpoints are optimized in this series by introducing a virtual abstract_replication_strategy::has_static_natural_endpoints predicate that is true for local_strategy and everywhere_replication_strategy, and is false otherwise. With it, functions repeatedly calling calculate_natural_endpoints in a loop, for every token, will call it only once since it will return the same result every time anyhow. Refs #11005 Doesn't fix the issue as the large allocation still remains until we make change dht::token_range_vector chunked (chunked_vector cannot be used as is at the moment since we require the ability to push also to the front when unwrapping) Closes #11009 * github.com:scylladb/scylladb: effective_replication_map: make get_range_addresses asynchronous range_streamer: add_ranges and friends: get erm as param storage_service: get_new_source_ranges: get erm as param storage_service: get_changed_ranges_for_leaving: get erm as param storage_service: get_ranges_for_endpoint: get erm as param repair: use get_non_local_strategy_keyspaces_erms database: add get_non_local_strategy_keyspaces_erms database: add get_non_local_strategy_keyspaces storage_service: coroutinize update_pending_ranges effective_replication_map: add get_replication_strategy effective_replication_map: get_range_addresses: use the precalculated replication_map abstract_replication_strategy: get_pending_address_ranges: prevent extra vector copies abstract_replication_strategy: reindent utils: sequenced_set: expose set and `contains` method abstract_replication_strategy: calculate_natural_endpoints: return endpoint_set utils: sequenced_set: templatize VectorType utils: sanitize sequenced_set utils: sequenced_set: delete mutable get_vector method	2022-08-09 13:25:53 +03:00
Benny Halevy	f01a526887	docs: debugging: mention use of release number on backtrace.scylladb.com Following scylladb/scylla_s3_reloc_server@af17e4ffcd (scylladb/scylla_s3_reloc_server#28), the release number can be used to search the relcatable package and/or decode a respective backtrace. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11247	2022-08-09 12:49:59 +03:00
Avi Kivity	d4c986e4fa	Merge 'doc: add the upgrade guide from 5.0 to 2022.1 on Ubuntu 20.04' from Anna Stuchlik Ubuntu 22.04 is supported by both ScyllaDB Open Source 5.0 and Enterprise 2022.1. Closes #11227 * github.com:scylladb/scylladb: doc: add the redirects from Ubuntu version specific to version generic pages doc: remove version-speific content for Ubuntu and add the generic page to the toctree doc: rename the file to include Ubuntu doc: remove the version number from the document and add the link to Supported Versions doc: add a generic page for Ubuntu doc: add the upgrade guide from 5.0 to 2022.1 on Ubuntu 2022.1	2022-08-09 12:49:16 +03:00
Asias He	12ab2c3d8d	storage_service: Prevent removed node to restart and join the cluster 1) Start node1,2,3 2) Stop node3 3) Run nodetool removenode $host_id_of_node3 4) Restart node3 Step 4 is wrong and not allowed. If it happens it will bring back node3 to the cluster. This patch adds a check during node restart to detect such operation error and reject the restart. With this patch, we would see the following in step 4. ``` init - Startup failed: std::runtime_error (The node 127.0.0.3 with host_id fa7e500a-8617-4de4-8efd-a0e177218ee8 is removed from the cluster. Can not restart the removed node to join the cluster again!) ``` Refs #11217 Closes #11244	2022-08-09 12:46:21 +03:00
Avi Kivity	1d4bf115e2	Merge 'row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy' from Tomasz Grabiec Scenario: cache = [ row(pos=2, continuous=false), row(pos=after(2), dummy=true) ] Scanning read starts, starts populating [-inf, before(2)] from sstables. row(pos=2) is evicted. cache = [ row(pos=after(2), dummy=true) ] Scanning read finishes reading from sstables. Refreshes cache cursor via partition_snapshot_row_cursor::maybe_refresh(), which calls partition_snapshot_row_cursor::advance_to() because iterators are invalidated. This advances the cursor to after(2). no_clustering_row_between(2, after(2)) returns true, so advance_to() returns true, and maybe_refresh() returns true. This is interpreted by the cache reader as "the cursor has not moved forward", so it marks the range as complete, without emitting the row with pos=2. Also, it marks row(pos=after(2)) as continuous, so later reads will also miss the row. The bug is in advance_to(), which is using no_clustering_row_between(a, b) to determine its result, which by definition excludes the starting key. Discovered by row_cache_test.cc::test_concurrent_reads_and_eviction with reduced key range in the random_mutation_generator (1024 -> 16). Fixes #11239 Closes #11240 * github.com:scylladb/scylladb: test: mvcc: Fix illegal use of maybe_refresh() tests: row_cache_test: Add test_eviction_of_upper_bound_of_population_range() tests: row_cache_test: Introduce one_shot mode to throttle row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy	2022-08-09 12:39:10 +03:00
Anna Stuchlik	e753b4e793	doc: language, formatting, and organization improvements	2022-08-09 10:34:22 +02:00
Tomasz Grabiec	f59d2d9bf8	range_tombstone_list: Avoid amortized_reserve() We can use std::in_place_type<> to avoid constructing op before calling emplace_back(). As a reuslt, we can avoid reserving space. The reserving was there to avoid the need to roll-back in case emplace_back() throws. Kudos to Kamil for suggesting this. Closes #11238	2022-08-09 11:34:16 +03:00
Avi Kivity	8d37370a71	Revert "Merge "memtable-sstable: Add compacting reader when flushing memtable." from Mikołaj" This reverts commit `bcadd8229b`, reversing changes made to `cf528d7df9`. Since `4bd4aa2e88` ("Merge 'memtable, cache: Eagerly compact data with tombstones' from Tomasz Grabiec"), memtable is self-compacting and the extra compaction step only reduces throughput. The unit test in memtable_test.cc is not reverted as proof that the revert does not cause a regression. Closes #11243	2022-08-09 11:23:29 +03:00
Anna Stuchlik	61d33cb2a8	doc: add a disclaimer about not supporting local counters by SSTableLoader	2022-08-09 10:00:14 +02:00
Anna Stuchlik	4be88e1a79	doc: add the redirects from Ubuntu version specific to version generic pages	2022-08-09 09:43:28 +02:00
Raphael S. Carvalho	337390d374	forward_service: execute_on_this_shard: avoid reallocation and copy avoid about log2(256)=8 reallocations when pushing partition ranges to be fetched. additionally, also avoid copying range into ranges container. current_range will not contain the last range, after moved, but will still be engaged by the end of the loop, allowing next iteration to happen as expected. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #11242	2022-08-09 09:08:53 +02:00
Botond Dénes	1b669cefed	service/storage_proxy: add get_tombstone_limit() To be used by coordinator side code to determine the correct tombstone limit to pass to read-command (tombstone limit field added in the next commit). When this limit is non-zero, the replica will start cutting pages after the tombstone limit is surpassed. This getter works similarly to `get_max_result_size()`: if the cluster feature for empty replica pages is set, it will return the value configured via db::config::query_tombstone_limit. System queries always use a limit of 0 (unlimited tombstones).	2022-08-09 10:00:40 +03:00
Botond Dénes	8cd2ef7a42	query: add tombstone_limit type Will be used in read_command. Add it before it is added to read-command so we can use the unlimited constant in code added in preparation to that.	2022-08-09 10:00:40 +03:00
Botond Dénes	33f0447ba0	db/config: add config item for query tombstone limit This will be the value used to break pages, after processing the specified amount of tombstones. The page will be cut even if empty. We could maybe use the already existing tombstone_{warn,fail}_threshold instead and use them as a soft/hard limit pair, like we did with page sizes.	2022-08-09 10:00:40 +03:00
Botond Dénes	1bc14b5e3b	gms: add cluster feature for empty replica pages So we can start using them only when the entire cluster supports it.	2022-08-09 10:00:40 +03:00
Botond Dénes	60a0e3d88b	tree: don't use query::read_command's IDL constructor It is not type safe: has multiple limits passed to it as raw ints, as well as other types that ints implicitly convert to. Furthermore the row limit is passed in two separate fields (lower 32 bits and upper 32 bits). All this make this constructor a minefield for humans to use. We have a safer constructor for some time but some users of the old one remain. Move them to the safe one.	2022-08-09 10:00:37 +03:00
Tomasz Grabiec	05b0a62132	test: mvcc: Fix illegal use of maybe_refresh() maybe_refresh() can only be called if the cursor is pointing at a row. The code was calling it before the cursor was advanced, and was thus relying on implementation detail.	2022-08-09 02:28:56 +02:00
Tomasz Grabiec	ce624048d9	tests: row_cache_test: Add test_eviction_of_upper_bound_of_population_range() Reproducer for #11239.	2022-08-09 02:28:56 +02:00
Tomasz Grabiec	6aaa6f8744	tests: row_cache_test: Introduce one_shot mode to throttle	2022-08-09 02:28:56 +02:00
Tomasz Grabiec	a6a61eaf96	row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy Scenario: cache = [ row(pos=2, continuous=false), row(pos=after(2), dummy=true) ] Scanning read starts, starts populating [-inf, before(2)] from sstables. row(pos=2) is evicted. cache = [ row(pos=after(2), dummy=true) ] Scanning read finishes reading from sstables. Refreshes cache cursor via partition_snapshot_row_cursor::maybe_refresh(), which calls partition_snapshot_row_cursor::advance_to() because iterators are invalidated. This advances the cursor to after(2). no_clustering_row_between(2, after(2)) returns true, so advance_to() returns true, and maybe_refresh() returns true. This is interpreted by the cache reader as "the cursor has not moved forward", so it marks the range as complete, without emitting the row with pos=2. Also, it marks row(pos=after(2)) as continuous, so later reads will also miss the row. The bug is in advance_to(), which is using no_clustering_row_between(a, b) to determine its result, which by definition excludes the starting key. Discovered by row_cache_test.cc::test_concurrent_reads_and_eviction with reduced key range in the random_mutation_generator (1024 -> 16). Fixes #11239	2022-08-09 02:28:56 +02:00
Takuya ASADA	3ffc978166	main: move preinit_description to main() We don't need to wait for handling version options after scylla_main() called, we can handle it in main() instead. Closes #11221	2022-08-08 18:31:43 +03:00
Benny Halevy	91ab8ee1c3	effective_replication_map: make get_range_addresses asynchronous So it may yield, preenting reactor stalls as seen in #11005. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Benny Halevy	9b2af3f542	range_streamer: add_ranges and friends: get erm as param Rather than getting it in the callee, let the caller (e.g. storage_service) hold the erm and pass it down to potentially multiple async functions. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Benny Halevy	194b9af8d6	storage_service: get_new_source_ranges: get erm as param Rather than getting it in the callee, let the caller hold the erm and pass it down to potentially multiple async functions. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Benny Halevy	b50c79eab3	storage_service: get_changed_ranges_for_leaving: get erm as param Rather than getting it in the callee, let the caller hold the erm and pass it down to potentially multiple async functions. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Benny Halevy	a5d7ade237	storage_service: get_ranges_for_endpoint: get erm as param Let its caller Pass the effective_replication_map ptr so we can get it at the top level and keep it alive (and coherent) through multiple asynchronous calls. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Benny Halevy	cffe00cc58	repair: use get_non_local_strategy_keyspaces_erms Use get_non_local_strategy_keyspaces_erms for getting a coherent set of keyspace names and their respective effective replication strategy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Benny Halevy	db5c5ca59e	database: add get_non_local_strategy_keyspaces_erms To be used for getting a coheret set of all keyspaces with non-local replication strategy and their respective effective_replication_map. As an example, use it in this patch in storage_service::update_pending_ranges. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Benny Halevy	7ee6048255	database: add get_non_local_strategy_keyspaces For node operations, we currently call get_non_system_keyspaces but really want to work on all keyspace that have non-local replication strategy as they are replicated on other nodes. Reflect that in the replica::database function name. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Benny Halevy	d8484b3ee6	storage_service: coroutinize update_pending_ranges Before we make a change in getting the keyspaces and their effective_replication_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:01 +03:00
Benny Halevy	e541009f65	effective_replication_map: add get_replication_strategy And use it in storage_service::get_changed_ranges_for_leaving. A following patch will pass the e_r_m to storage_service::get_changed_ranges_for_leaving, rather than getting it there. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:00 +03:00
Benny Halevy	6794e15163	effective_replication_map: get_range_addresses: use the precalculated replication_map There is no need to call get_natural_endpoints for every token in sorted_tokens order, since we can just get the precalculated per-token endpoints already in the _replication_map member. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:00 +03:00
Benny Halevy	1d4aea4441	abstract_replication_strategy: get_pending_address_ranges: prevent extra vector copies Reduce large allocations and reactor stalls seen in #11005 by open coding `get_address_ranges` and using std::vector::insert to efficiently appending the ranges returned by `get_primary_ranges_for` onto the returned token_range_vector in contrast to building an unordered_multimap<inet_address, dht::token_range> first in `get_address_ranges` and traversing it and adding one token_range at a time. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:00 +03:00
Benny Halevy	7811b0d0aa	abstract_replication_strategy: reindent	2022-08-08 17:31:00 +03:00
Benny Halevy	ebe1edc091	utils: sequenced_set: expose set and `contains` method And use that in sights using the endpoint set returned by abstract_replication_strategy::calculate_natural_endpoints. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:00 +03:00
Benny Halevy	7017ad6822	abstract_replication_strategy: calculate_natural_endpoints: return endpoint_set So it could be used also for easily searching for an endpoint. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:00 +03:00
Benny Halevy	38934413d4	utils: sequenced_set: templatize VectorType Se we can use basic_sequenced_set<T, std::small_vector<T, N>> as locator::endpoint_set. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:00 +03:00
Benny Halevy	df380c30b9	utils: sanitize sequenced_set And templatize its Vector type so it can be used with a small_vector for inet_address_vector_replica_set. Mark the methods const/noexcept as needed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:00 +03:00
Benny Halevy	57d9275d4a	utils: sequenced_set: delete mutable get_vector method It is dangerous to use since modifying the sequenced_set vector will make it go out of sync with the associated unordered_set member, making the object unusable. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 17:31:00 +03:00
Yaron Kaikov	2fe2306efb	configure.py: add date-stamp parameter When starting `Build` job we have a situation when `x86` and `arm` starting in different dates causing the all process to fail As suggested by @avikivity , adding a date-stamp parameter and will pass it through downstream jobs to get one release for each job Ref: scylladb/scylla-pkg#3008 Closes #11234	2022-08-08 17:28:38 +03:00
Anna Stuchlik	7d4770c116	doc: remove version-speific content for Ubuntu and add the generic page to the toctree	2022-08-08 16:18:20 +02:00
Anna Stuchlik	eb60e5757a	doc: rename the file to include Ubuntu	2022-08-08 16:12:02 +02:00
Anna Stuchlik	011e2fad60	doc: remove the version number from the document and add the link to Supported Versions	2022-08-08 16:11:14 +02:00
Anna Stuchlik	83c08ac5fa	doc: add a generic page for Ubuntu	2022-08-08 16:04:59 +02:00
Avi Kivity	871127f641	Update tools/java submodule * tools/java ad6764b506...6995a83cc1 (1): > dist/debian: drop upgrading from scylla-tools < 2.0	2022-08-08 16:51:14 +03:00
Anna Stuchlik	260f85643d	doc: specify the recommended AWS instance types	2022-08-08 14:35:54 +02:00
Anna Stuchlik	2c69a8f458	doc: replace the tables with a generic description of support for Im4gn and Is4gen instances	2022-08-08 14:17:59 +02:00
Botond Dénes	49c00fa989	Merge 'Define strong uuid-class types for table_id, table_schema_version and query_id' from Benny Halevy We would like to define more distinct types that are currently defined as aliases to utils::UUID to identify resources in the system, like table id and schema version id. As with counter_id, the motivation is to restrict the usage of the distinct types so they can be used (assigned, compared, etc.) only with objects of the same type. Using with a generic UUID will then require explicit conversion, that we want to expose. This series starts with cleaning up the idl header definition by adding support for `import` and `include` statements in the idl-compiler. These allow the idl header to become self-sufficient and then remove manually-added includes from source files. The latter usually need only the top level idl header and it, in turn, should include other headers if it depends on them. Then, a UUID_class template was defined as a shared boiler plate for the various uuid-class. First, we convert counter_id to use it, rather than mimicking utils::UUID on its own. On top of utils::UUID_class<T>, we define table_id, table_schema_version, and query_id. Following up on this series, we should define more commonly used types like: host_id, streaming_plan_id, paxos_ballot_id. Fixes #11207 Closes #11220 * github.com:scylladb/scylladb: query-request, everywhere: define and use query_id as a strong type schema, everywhere: define and use table_schema_version as a strong type schema, everywhere: define and use table_id as a strong type schema: include schema_fwd.hh in schema.hh system_keyspace: get_truncation_record: delete unused lambda capture utils: uuid: define appending_hash<utils::tagged_uuid<Tag>> utils: tagged_uuid: rename to_uuid() to uuid() counters: counter_id: use base class create_random_id counters: base counter_id on utils::tagged_uuid utils: tagged_uuid: mark functions noexcept utils: tagged_uuid: bool: reuse uuid::bool operator raft: migrate tagged_id definition to utils::tagged_uuid utils: uuid: mark functions noexcept counters: counter_id delete requirement for triviality utils: bit_cast: require TriviallyCopyable To repair: delete unused include of utils/bit_cast.hh bit_cast: use std::bit_cast idl: make idl headers self-sufficient db: hints: sync_point: do not include idl definition file db/per_partition_rate_limit: tidy up headers self-sufficiency idl-compiler: include serialization impl and visitors in generated dist.impl.hh files idl-compiler: add include statements idl_test: add a struct depending on UUID	2022-08-08 13:20:40 +03:00
Benny Halevy	c71ef330b2	query-request, everywhere: define and use query_id as a strong type Define query_id as a tagged_uuid So it can be differentiated from other uuid-class types. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:13:28 +03:00
Benny Halevy	2b017ce285	schema, everywhere: define and use table_schema_version as a strong type Define table_schema_version as a distinct tagged_uuid class, So it can be differentiated from other uuid-class types, in particular table_id. Added reversed(table_schema_version) for convenience and uniformity since the same logic is currently open coded in several places. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:09:45 +03:00
Benny Halevy	257d74bb34	schema, everywhere: define and use table_id as a strong type Define table_id as a distinct utils::tagged_uuid modeled after raft tagged_id, so it can be differentiated from other uuid-class types, in particular from table_schema_version. Fixes #11207 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:09:41 +03:00
Benny Halevy	26aacb328e	schema: include schema_fwd.hh in schema.hh And remove repeated definitions and forward declarations of the same types in both places. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:28 +03:00
Benny Halevy	6e77ad9392	system_keyspace: get_truncation_record: delete unused lambda capture Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:28 +03:00
Benny Halevy	a390b8475b	utils: uuid: define appending_hash<utils::tagged_uuid<Tag>> And simplify usage for appending_hash<counter_shard_view> respectively. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:28 +03:00
Benny Halevy	8235cfdf7a	utils: tagged_uuid: rename to_uuid() to uuid() To make it more generic, similar to other uuid() get methods we have. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	813cffc2b5	counters: counter_id: use base class create_random_id Rather than defining generate_random, and use respectively in unit tests. (It was inherited from raft::internal::tagged_id.) This allows us to shorten counter_id's definition to just using utils::tagged_uuid<struct counter_id_tag>. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	e9cc24bc18	counters: base counter_id on utils::tagged_uuid Use the common base class for uuid-based types. tagged_uuid::to_uuid defined here for backward compatibility, but it will be renamed in the next patch to uuid(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	082d5efca8	utils: tagged_uuid: mark functions noexcept Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	1b78f8ba82	utils: tagged_uuid: bool: reuse uuid::bool operator utils::UUID defined operator bool the same way, rely on it rather than reimplementing it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	6436c614d7	raft: migrate tagged_id definition to utils::tagged_uuid So it can be used for other types in the system outside of raft, like counter_id, table_id, table_schema_version, and more. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	f0567ab853	utils: uuid: mark functions noexcept Before we define a tagged_uuid template. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	ea91ccaa20	counters: counter_id delete requirement for triviality This stemmed from utils/bit_cast overly strict requirement. Now that it was relaxed, these is no need for this static assert as counter_id is trivially copyable, and that is checked by bit_cast {read,write}_unaligned Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	c68e929b80	utils: bit_cast: require TriviallyCopyable To Like std::bit_cast (https://en.cppreference.com/w/cpp/numeric/bit_cast) we only require To to be trivially copyable. There's no need for it to be a trivial type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	2948a4feb6	repair: delete unused include of utils/bit_cast.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	79000bc02e	bit_cast: use std::bit_cast Now that scylla requries c++20 there's no need to define our own implementation in utils/bit_cast.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	1fda686f96	idl: make idl headers self-sufficient Add include statements to satisfy dependencies. Delete, now unneeded, include directives from the upper level source files. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	cfc7e9aa59	db: hints: sync_point: do not include idl definition file idl definition files are not intended for direct inclusion in .cc files. Data types it represents are supposed to be defined in regular C++ header, so define them in db/hints/scyn_point.hh and include it rather then idl/hinted_handoff.idl.hh. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	82fa205723	db/per_partition_rate_limit: tidy up headers self-sufficiency include what's needed where needed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	83811b8e35	idl-compiler: include serialization impl and visitors in generated dist.impl.hh files They are generally required by the serialization implementation. This will simplify using them without having to hand pick what header to include in the .cc file that includes them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	da4f0aae37	idl-compiler: add include statements For generating #include directives in the generated files, so we don't have to hand-craft include the dependencies in the right order. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Benny Halevy	4f275a17b4	idl_test: add a struct depending on UUID For testing the next change which adds import and include statements to the idl language. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:02:27 +03:00
Avi Kivity	ba42852b0e	Merge 'Overhaul truncate and snapshot' from Benny Halevy This series is aimed at fixing #11132. To get there, the series untangles the functions that currently depend on the the cross-shard coordination in table::snapshot, namely database::truncate and consequently database::drop_column_family. database::get_table_on_all_shards is added here as a helper to get a foreign shared ptr of the the table shard from all shards, and it is later used by multiple functions to truncate and then take a snapshot of the sharded table. database::truncate_table_on_all_shards is defined to orchestrate the truncate process end-to-end, flushing or clearing all table shards before taking a snapshot if needed, using the newly defined table::snapshot_on_all_shards, and by that leaving only the discard_sstables job to the per-shard database::truncate function. The latter, snapshot_on_all_shards, orchestrates the snapshot process on all shards - getting rid of the per-shard table::snapshot function (after refactoring take_snapshot and finalize_snapshot out of it), and the associated dreaded data structures: snapshot_manager and pending_snapshots. Fixes #11132. Closes #11133 * github.com:scylladb/scylladb: table: reindent write_schema_as_cql table: coroutinize write_schema_as_cql table: seal_snapshot: maybe_yield when iterating over the table names table: reindent seal_snapshot table: coroutinize seal_snapshot table: delete unused snapshot_manager and pending_snapshots table: delete unused snapshot function table: snapshot_on_all_shards: orchestrate snapshot process table: snapshot: move pending_snapshots.erase from seal_snapshot table: finalize_snapshot: take the file sets as a param table: make seal_snapshot a static member table: finalize_snapshot: reindent table: refactor finalize_snapshot out of snapshot table: snapshot: keep per-shard file sets in snapshot_manager table: take_snapshot: return foreign unique ptr table: take_snapshot: maybe yield in per-sstable loop table: take_snapshot: simplify tables construction code table: take_snapshot: reindent table: take_snapshot: simplify error handling table: refactor take_snapshot out of snapshot utils: get rid of joinpoint database: get rid of timestamp_func database: truncate: snapshot table in all-shards layer database: truncate: flush table and views in all-shards layer database: truncate: stop and disable compaction in all-shards layer database: truncate: move call to set_low_replay_position_mark to all-shards layer database: truncate: enter per-shard table async_gate in all-shards layer database: truncate: move check for schema_tables keyspace to all-shards layer. database: snapshot_table_on_all_shards: reindent table: add snapshot_on_all_shards database: add snapshot_table_on_all_shards database: rename {flush,snapshot}_on_all and make static database: drop_table_on_all_shards: truncate and stop table in upper layer database: drop_table_on_all_shards: get all table shards before drop_column_family on each database: drop_column_family: define table& cf database: drop_column_family: reuse uuid for evict_all_for_table database: drop_column_family: move log message up a layer database: truncate: get rid of the unused ks param database: add truncate_table_on_all_shards database: drop_table_on_all_shards: do not accept a truncated_at timestamp_func database: truncate: get optional snapshot_name from caller database: truncate: fix assert about replay_position low_mark database_test: apply_mutation on the correct db shard	2022-08-07 19:15:42 +03:00
Benny Halevy	45ce635527	table: reindent write_schema_as_cql Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	3b2cce068a	table: coroutinize write_schema_as_cql and make sure to always close the output stream. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	dbae7807d1	table: seal_snapshot: maybe_yield when iterating over the table names Add maybe_yield calls in tight loop, potentially over thousands of sstable names to prevent reactor stalls. Although the per-sstable cost is very small, we've experienced stalls realted to printing in O(#sstables) in compaction. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	3ba0c72b77	table: reindent seal_snapshot Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	41a2d09a5d	table: coroutinize seal_snapshot Handle exceptions, making sure the output stream is properly closed in all cases, and an intermediate error, if any, is returned as the final future. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	5316dbbe78	table: delete unused snapshot_manager and pending_snapshots Now that snapshot orchestration in snapshot_on_all_shards doesn't use snapshot_manager, get rid of the data structure. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	cca9068cfb	table: delete unused snapshot function Now that snapshot orchestration is done solely in snapshot_on_all_shards, the per-shard snapshot function can be deleted. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	351a3a313d	table: snapshot_on_all_shards: orchestrate snapshot process Call take_snapshot on each shard and collect the returns snapshot_file_set. When all are done, move the vector<snapshot_file_set> to finalize_snapshot. All that without resorting to using the snapshot_manager nor calling table::snapshot. Both will deleted in the following patches. Fixes #11132 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	84dfd2cabb	table: snapshot: move pending_snapshots.erase from seal_snapshot Now that seal_snapshot doesn't need to lookup the snapshot_manager in pending_snapshots to get to the file_sets, erasing the snapshot_manager object can be done in table::snapshot which also inserted it there. This will make it easier to get rid of it in a later patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	39276cacc3	table: finalize_snapshot: take the file sets as a param and pass it to seal_snapshot, so that the latter won't need to lookup and access the snapshot_manager object. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	4dd56bbd6d	table: make seal_snapshot a static member Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	7cb0a3f6f4	table: finalize_snapshot: reindent Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	12716866a9	table: refactor finalize_snapshot out of snapshot Write schema.cql and the files manifest in finalize_snapshot. Currently call it from table::snapshot, but it will be called in a later patch by snapshot_on_all_shards. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	240f83546d	table: snapshot: keep per-shard file sets in snapshot_manager To simplify processing of the per-shard file names for generating the manifest. We only need to print them to the manifest at the end of the process, so there's no point in copying them around in the process, just move the foreign unique unordered_set. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	5100c1ba68	table: take_snapshot: return foreign unique ptr Currently copying the sstable file names are created and destroyed on each shard and are copied by the "coordinator" shards using submit_to, while the coroutine holds the source on its stack frame. To prepare for the next patches that refactor this code so that the coordinator shard will submit_to each shard to perform `take_snapshot` and return the set of sstrings in the future result, we need to wrap the result in a foreign_ptr so it gets freed on the shard that created it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	b54626ad0e	table: take_snapshot: maybe yield in per-sstable loop There could be thousands of sstables so we better cosider yielding in the tight loop that copies the sstable names into the unordered_set we return. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	24a1a4069e	table: take_snapshot: simplify tables construction code Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	75e38ebccc	table: take_snapshot: reindent Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	67c1d00f44	table: take_snapshot: simplify error handling Don't catch exception but rather just return them in the return future, as the exception is handled by the caller. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	ff6508aa53	table: refactor take_snapshot out of snapshot Do the actual snapshot-taking code in a per-shard take_snapshot function, to be called from snapshot_on_all_shards in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	37b7a9cce2	utils: get rid of joinpoint Now that it is no longer used. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	56f336d1aa	database: get rid of timestamp_func Pass an optional truncated_at time_point to truncate_table_on_all_shards instead of the over-complicated timestamp_func that returns the same time_point on all shards anyhow, and was only used for coordination across shards. Since now we synchronize the internal execution phase in truncate_table_on_all_shards, there is no longer need for this timestamp_func. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	b640c4fd17	database: truncate: snapshot table in all-shards layer With that the database layer does no longer need to invoke the private table::snapshot function, so it can be defriended from class table. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	af0c71aa12	database: truncate: flush table and views in all-shards layer Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	6e07e6b7ac	database: truncate: stop and disable compaction in all-shards layer Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	e78dad1dfb	database: truncate: move call to set_low_replay_position_mark to all-shards layer Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	a8bd3d97b6	database: truncate: enter per-shard table async_gate in all-shards layer Start moving the per-shard state establishment logic to truncate_table_on_all_shards, so that we would evetually do only the truncate logic per-se in the per-shard truncate function. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	ff028316f2	database: truncate: move check for schema_tables keyspace to all-shards layer. Now that the per-shard truncate function is called only from truncate_table_on_all_shards, we can reject the schema_tables keyspace in the upper layer. There's no need to check that on each shard. While at it, reuse `is_system_keyspace`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	fbe1fa1370	database: snapshot_table_on_all_shards: reindent Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	4d4ca40c38	table: add snapshot_on_all_shards Called from the respective database entry points. Will be called also from the database drop / truncate path and will be used for central coordination of per-shard table::snapshot so we don't have to depend on the snapshot_manager mechanism that is fragile and currently causes abort if we fail to allocate it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	be56a73e78	database: add snapshot_table_on_all_shards We need to snapshot a single table in several paths. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	d96b56fee2	database: rename {flush,snapshot}_on_all and make static Follow the convention of drop_table_on_all_shards. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	a1eed1a6e9	database: drop_table_on_all_shards: truncate and stop table in upper layer truncate the table on all shards then stop it on shards in the upper layer rather than in the per-shard drop_column_family() function, so we can further refactor truncate later, flushing and taking snapshot on all shards, before truncating. With that, rename drop_column_family to detach_columng_family as now it only deregisters the column family from containers that refer to it (even via its uuid) and then its caller is reponsible to take it from there. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	92cb7d448b	database: drop_table_on_all_shards: get all table shards before drop_column_family on each Se we the upper layer can flush, snapshot, and truncate the table on all shards, step by step. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	0aaaefbb5c	database: drop_column_family: define table& cf To reduce the churn in the following patch that will pass the table& as a parameter. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	bb1e5ffb8c	database: drop_column_family: reuse uuid for evict_all_for_table cf->schema()->id() is the same one returned by find_uuid(ks_name, cf_name); As a follow up, we should define a concrete table_id type and rename schema::id() to schema::table_id() to return it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	e800e1e720	database: drop_column_family: move log message up a layer Print once on "coordinator" shard. And promote to info level as it's important to log when we're dropping a table (and if we're going to take a snapshot). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	ca78a63873	database: truncate: get rid of the unused ks param Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	46e2a7c83b	database: add truncate_table_on_all_shards As a first step to decouple truncate from flush and snpashot. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:53:05 +03:00
Benny Halevy	5e8c05f1a8	database: drop_table_on_all_shards: do not accept a truncated_at timestamp_func Since in the drop_table case we want to discard ALL sstables in the table, not only those with `max_data_age()` up until drop started. Fixes #11232 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:52:51 +03:00
Benny Halevy	574909c78f	database: truncate: get optional snapshot_name from caller Before we change drop_table_on_all_shards to always pass db_clock::time_point::max() in the next patch, let it pass a unique snapshot name, otherwise the snapshot name will always be based on the constant, max time_point. Refs #11232 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 12:03:19 +03:00
Benny Halevy	474b2fdf37	database: truncate: fix assert about replay_position low_mark This assert was tweaked several times: Introduced in `83323e155e`, then fixed in `b2b1a1f7e1` to account for no rp from discard_sstables, then in `9620755c7f` to account for cases we do not flush the table, then again in `71c5dc82df` to make that more accurate. But, the assert wasn't correct in the first place in the sense that we first get `low_mark` which represents the highest replay_position at the time truncate was called, but then we call discard_sstables with a time_point of `truncated_at` that we get from the caller via the timestamp_func, and that one could be in the past, before truncate was called - hence discard_sstables with that timestamp may very well return a replay_position from older sstables, prior to flush that can be smaller than the low_mark. Fix this assert to account for that case. The real fix to this issue is to have a truncate_tombstone that will carry an authoritative api::timstamp (#11230) Fixes #11231 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 09:18:06 +03:00
Benny Halevy	9f5e13800d	database_test: apply_mutation on the correct db shard Following up on `1c26d49fba`, apply mutations on the correct db shard in all test cases before we define and use database::truncate_table_on_all_shards. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-07 09:18:06 +03:00
Tomasz Grabiec	7f80602b01	db: range_tombstone_list: Avoid quadratic behavior when applying Range tombstones are kept in memory (cache/memtable) in range_tombstone_list. It keeps them deoverlapped, so applying a range tombstone which covers many range tombstones will erase existing range tombstones from the list. This operation needs to be exception-safe, so range_tombstone_list maintains an undo log. This undo log will receive a record for each range tombstone which is removed. For exception safety reasons, before pushing an undo log entry, we reserve space in the log by calling std::vector::reserve(size() + 1). This is O(N) where N is the number of undo log entries. Therefore, the whole application is O(N^2). This can cause reactor stalls and availability issues when replicas apply such deletions. This patch avoids the problem by reserving exponentially increasing amount of space. Also, to avoid large allocations, switches the container to chunked_vector. Fixes #11211 Closes #11215	2022-08-05 20:34:07 +03:00
Kamil Braun	d84a93d683	Merge 'Raft test topology part 1' from Alecco These are the first commits out of #10815. It starts by moving pytest logic out of the common `test/conftest.py` and into `test/topology/conftest.py`, including removing the async support as it's not used anywhere else. There's a fix of a bug of leaving tables in `RandomTables.tables` after dropping all of them. Keyspace creation is moved out of `conftest.py` into `RandomTables` as it makes more sense and this way topology tests avoid all the workarounds for old version (topology needs ScyllaDB 5+ for Raft, anyway). And a minor fix. Closes #11210 * github.com:scylladb/scylladb: test.py: fix type hint for seed in ScyllaServer test.py: create/drop keyspace in tables helper test.py: RandomTables clear list when dropping all tables test.py: move topology conftest logic to its own test.py: async topology tests auto run with pytest_asyncio	2022-08-05 17:56:16 +02:00
Anna Stuchlik	d48ae5a9e0	doc: add the upgrade guide from 5.0 to 2022.1 on Ubuntu 2022.1	2022-08-05 17:49:01 +02:00
Warren Krewenki	4178ccd27f	gossiper: Correct typo in log message Closes #11212	2022-08-05 18:21:36 +03:00
Anna Stuchlik	ceaf0c41bd	doc: add support for AWS i4g instances	2022-08-05 17:18:44 +02:00
Anna Stuchlik	7711436577	doc: extend the list of supported CPUs	2022-08-05 16:55:40 +02:00
Alejo Sanchez	ec70e26f12	test.py: fix type hint for seed in ScyllaServer Param seed can be None (e.g. first server) so fix type hint accordingly. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-05 13:05:26 +02:00
Alejo Sanchez	1d7789e5a9	test.py: create/drop keyspace in tables helper Since all topology test will use the helper, create the keyspace in the helper. Avoid the need of dropping all tables per test and just drop the keyspace. While there, use blocking CQL execution so it can be used in the constructor and avoids possible issues with scheduling on cleanup. Also, creation and drop should happen only once per cluster and no test should be running changes (either not started or finished). All topology tests are for Scylla with Raft. So don't use the Cassandra this_dc workaround as it's unnecessary for Scylla. Remove return type of random_tables fixture to match other fixtures everywhere else. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-05 13:05:26 +02:00
Alejo Sanchez	9a019628f5	test.py: RandomTables clear list when dropping all tables Clear the list of active tables when dropping them. While there do the list element exchange atomically across active and removed tables lists. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-05 13:05:26 +02:00
Alejo Sanchez	f6aa0d7bd7	test.py: move topology conftest logic to its own Move asyncio, Raft checks, and RandomTables to topology test suite's own conftest file. While there, use non-async version of pre-checks to avoid unnecessary complexity (we want async tests, not async setup, for now). Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-05 13:05:26 +02:00
Alejo Sanchez	f665779cdb	test.py: async topology tests auto run with pytest_asyncio Async tests and fixtures in the topology directory are expected to run with pytest_asyncio (not other async frameworks). Force this with auto mode. CI has an older pytest_asyncio version lacking pytest_asyncio.fixture. Auto mode helps avoiding the need of it and tests and fixtures can just be marked with regular @pytest.mark.async. This way tests can run in both older and newer versions of the packages. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2022-08-05 13:05:26 +02:00
Botond Dénes	fbbe2529c1	Merge "Remove global snitch usage from consistency_level.cc" from Pavel Emelyanov " There are several helpers in this .cc file that need to get datacenter for endpoints. For it they use global snitch, because there's no other place out there to get that data from. The whole dc/rack info is now moving to topology, so this set patches the consistency_level.cc to get the topology. This is done two ways. First, the helpers that have keyspace at hand may get the topology via ks's effective_replication_map. Two difficult cases are db::is_local() and db.count_local_endpoints() because both have just inet_address at hand. Those are patched to be methods of topology itself and all their callers already mess with token metadata and can get topology from it. " * 'br-consistency-level-over-topology' of https://github.com/xemul/scylla: consistency_level: Remove is_local() and count_local_endpoints() storage_proxy: Use topology::local_endpoints_count() storage_proxy: Use proxy's topology for DC checks storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver storage_proxy: Use topology local_dc_filter in its methods storage_proxy: Mark some digest_read_resolver methods private forwarding_service: Use topology local_dc_filter storage_service: Use topology local_dc_filter consistency_level: Use topology local_dc_filter consitency-level: Call count_local_endpoints from topology consistency_level: Get datacenter from topology replication_strategy: Remove hold snitch reference effective_replication_map: Get datacenter from topology topology: Add local-dc detection shugar	2022-08-05 13:31:55 +03:00
Anna Stuchlik	4bc7833a0b	doc: update the link to CQL3 type mapping on GitHub Closes #11224	2022-08-05 13:21:29 +03:00
Pavel Emelyanov	c3718b7a6e	consistency_level: Remove is_local() and count_local_endpoints() No code uses them now -- switched to use topology -- so thse two can be dropped together with their calls for global snitch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:48 +03:00
Pavel Emelyanov	9c662ee0e5	storage_proxy: Use topology::local_endpoints_count() A continuation of the previous patches -- now all the code that needs this helper have proxy pointer at hand Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:48 +03:00
Pavel Emelyanov	9a50d318b6	storage_proxy: Use proxy's topology for DC checks Several proxy helper classes need to filter endpoints by datacenter. Since now the have shared_ptr<proxy> on-board, they can get topology via proxy's token metadata Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:48 +03:00
Pavel Emelyanov	183a2d5a83	storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver It will be needed to get token metadata from proxy. The resolver in question is created and maintained by abstract_read_executor which already has shared_ptr<proxy>, so it just gives its copy Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:48 +03:00
Pavel Emelyanov	e1ea801b67	storage_proxy: Use topology local_dc_filter in its methods The proxy has token metadata pointer, so it can use its topology reference to filter endpoints by datacenter Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:47 +03:00
Pavel Emelyanov	6f515f852d	storage_proxy: Mark some digest_read_resolver methods private Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:47 +03:00
Pavel Emelyanov	9a19414c62	forwarding_service: Use topology local_dc_filter The service needs to filter out non-local endpoints for its needs. The service carries token metadata pointer and can get topology from it to fulfill this goal Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:47 +03:00
Pavel Emelyanov	2423e1c642	storage_service: Use topology local_dc_filter The storage-service API calls use db::is_local() helper to filter out tokens from non-local datacenter. In all those places topology is available from the token metadata pointer Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:47 +03:00
Pavel Emelyanov	0da8caba1d	consistency_level: Use topology local_dc_filter The filter_for_query() helper has keyspace at hand Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:47 +03:00
Pavel Emelyanov	de58b33eee	consitency-level: Call count_local_endpoints from topology Similar to previous patch, in those places with keyspace object at hand the topology can be obtained from ks' replication map Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:47 +03:00
Pavel Emelyanov	f84ee8f0fb	consistency_level: Get datacenter from topology In some of db/consistency_level.cc helpers the topology can be obtained from keyspace's effective replication map Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:47 +03:00
Pavel Emelyanov	00f166809e	replication_strategy: Remove hold snitch reference When the strategy is constructed there's no place to get snitch from so the global instance is used. However, after previous patch the replication strategy no longer needs snitch, so this dependency can be dropped Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:43 +03:00
Pavel Emelyanov	298213f27f	effective_replication_map: Get datacenter from topology Now it gets it from snitch, but the dc/rack info is being relocated onto topology. The topology is in turn already there Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-08-05 12:19:31 +03:00
Calle Wilund	fac2bc41ba	commitlog: Include "segments_to_replay" in initial footprint Fixes #11184 Not including it here can cause our estimate of "delete or not" after replay to be skewed in favour of retaining segments as (new) recycles (or even flip a counter), and if we have repeated crash+restarts we could be accumulating an effectivly ever increasing segment footprint Closes #11205	2022-08-05 12:16:53 +03:00
Pavel Emelyanov	527b345079	Merge 'storage_proxy: introduce a `remote` "subservice"' from Kamil Braun Introduce a `remote` class that handles all remote communication in `storage_proxy`: sending and receiving RPCs, checking the state of other nodes by accessing the gossiper, and fetching schema. The `remote` object lives inside `storage_proxy` and right now it's initialized and destroyed together with `storage_proxy`. The long game here is to split the initialization of `storage_proxy` into two steps: - the first step, which constructs `storage_proxy`, initializes it "locally" and does not require references to `messaging_service` and `gossiper`. - the second step will take those references and add the `remote` part to `storage_proxy`. This will allow us to remove some cycles from the service (de)initialization order and in general clean it up a bit. We'll be able to start `storage_proxy` right after the `database` (without messaging/gossiper). Similar refactors are planned for `query_processor`. Closes #11088 * github.com:scylladb/scylladb: service: storage_proxy: pass `migration_manager*` to `init_messaging_service` service: storage_proxy: `remote`: make `_gossiper` a const reference gms: gossiper: mark some member functions const db: consistency_level: `filter_for_query`: take `const gossiper&` replica: table: `get_hit_rate`: take `const gossiper&` gms: gossiper: move `endpoint_filter` to `storage_proxy` module service: storage_proxy: pass `shared_ptr<gossiper>` to `start_hints_manager` service: storage_proxy: establish private section in `remote` service: storage_proxy: remove `migration_manager` pointer service: storage_proxy: remove calls to `storage_proxy::remote()` from `remote` service: storage_proxy: remove `_gossiper` field alternator: ttl: pass `gossiper&` to `expiration_service` service: storage_proxy: move `truncate_blocking` implementation to `remote` service: storage_proxy: introduce `is_alive` helper service: storage_proxy: remove `_messaging` reference service: storage_proxy: move `connection_dropped` to `remote` service: storage_proxy: make `encode_replica_exception_for_rpc` a static function service: storage_proxy: move `handle_write` to `remote` service: storage_proxy: move `handle_paxos_prune` to `remote` service: storage_proxy: move `handle_paxos_accept` to `remote` service: storage_proxy: move `handle_paxos_prepare` to `remote` service: storage_proxy: move `handle_truncate` to `remote` service: storage_proxy: move `handle_read_digest` to `remote` service: storage_proxy: move `handle_read_mutation_data` to `remote` service: storage_proxy: move `handle_read_data` to `remote` service: storage_proxy: move `handle_mutation_failed` to `remote` service: storage_proxy: move `handle_mutation_done` to `remote` service: storage_proxy: move `handle_paxos_learn` to `remote` service: storage_proxy: move `receive_mutation_handler` to `remote` service: storage_proxy: move `handle_counter_mutation` to `remote` service: storage_proxy: remove `get_local_shared_storage_proxy` service: storage_proxy: (de)register RPC handlers in `remote` service: storage_proxy: introduce `remote`	2022-08-04 17:50:20 +03:00
Alejo Sanchez	97f0e11c3a	test.py: handle properly pytest ouput file for CQL tests Previously, if pytest itself failed (e.g. bad import or unexpected parameter), there was no output file but test.py tried to copy it and failed. Change the logic of handling the output file to first check if the file is there. Then if it's worth keeping it, move it to the test directory for easier comparison and maintenance. Else, if it's not worth keeping, discard it. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #11193	2022-08-04 16:48:53 +02:00
Pavel Emelyanov	cf0f912e59	cdc: Handle sleep-aborted exception on stop When update_streams_description() fails it spawns a fiber and retries the update in the background once every 60s. If the sleeping between attempts is aborted, the respective exceptional future happens to be ignored and warned in logs. fixes: #11192 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220802132148.20688-1-xemul@scylladb.com>	2022-08-04 13:03:29 +02:00
Kamil Braun	0a4e701b50	service: storage_proxy: pass `migration_manager*` to `init_messaging_service` `migration_manager` lifetime is longer than the lifetime of "storage proxy's messaging service part" - that is, `init_messaging_service` is called after `migration_manager` is started, and `uninit_messaging_service` is called before `migration_manager` is stopped. Thus we don't need to hold an owning pointer to `migration_manager` here. Later, when `init_messaging_service` will actually construct `remote`, this will be a reference, not a pointer. Also observe that `_mm` in `remote` is only used in handlers, and handlers are unregistered before `_mm` is nullified, which ensures that handlers are not running when `_mm` is nullified. (This argument shows why the code made sense regardless of our switch from shared_ptr to raw ptr).	2022-08-04 12:19:43 +02:00
Kamil Braun	a08be82ce2	service: storage_proxy: `remote`: make `_gossiper` a const reference	2022-08-04 12:19:43 +02:00
Kamil Braun	a1aa9cf3f7	gms: gossiper: mark some member functions const	2022-08-04 12:19:43 +02:00
Kamil Braun	a9fd156a1b	db: consistency_level: `filter_for_query`: take `const gossiper&`	2022-08-04 12:19:38 +02:00
Kamil Braun	7b4146dd2a	replica: table: `get_hit_rate`: take `const gossiper&` It doesn't use any non-const members.	2022-08-04 12:16:09 +02:00
Kamil Braun	566e5f2a4f	gms: gossiper: move `endpoint_filter` to `storage_proxy` module The function only uses one public function of `gossiper` (`is_alive`) and is used only in one place in `storage_proxy`. Make it a static function private to the `storage_proxy` module. The function used a `default_random_engine` field in `gossiper` for generating random numbers. Turn this field into a static `thread_local` variable inside the function - no other `gossiper` members used the field.	2022-08-04 12:16:09 +02:00
Kamil Braun	078900042f	service: storage_proxy: pass `shared_ptr<gossiper>` to `start_hints_manager` No need to call `_remote->gossiper().shared_from_this()` from within storage_proxy now.	2022-08-04 12:16:09 +02:00
Kamil Braun	d9d10d87ec	service: storage_proxy: establish private section in `remote` Only the (un)init, send_*, and `is_alive` functions are public, plus a getter for gossiper.	2022-08-04 12:16:05 +02:00
Kamil Braun	7364d453dd	service: storage_proxy: remove `migration_manager` pointer The ownership is passed to `remote`, which now contains a `shared_ptr<migration_manager>`.	2022-08-04 12:15:36 +02:00
Kamil Braun	bcc22ed1dc	service: storage_proxy: remove calls to `storage_proxy::remote()` from `remote` Catch `this` in the lambdas.	2022-08-04 12:15:36 +02:00
Kamil Braun	eddd3b8226	service: storage_proxy: remove `_gossiper` field Access `gossiper` through `_remote`. Later, all those accesses will handle missing `remote`. Note that there are also accesses through the `remote()` internal getter. The plan is as follows: - direct accesses through `_remote` will be modified to handle missing `_remote` (these won't cause an error) - `remote()` will throw if `_remote` is missing (`remote()` is only used for operations which actually need to send a message to a remote node).	2022-08-04 12:15:35 +02:00
Kamil Braun	ab946e392f	alternator: ttl: pass `gossiper&` to `expiration_service` This allows us to remove the `gossiper()` getter from `storage_proxy`.	2022-08-04 12:12:43 +02:00
Kamil Braun	242e31d56e	service: storage_proxy: move `truncate_blocking` implementation to `remote` The truncate operation always truncates a table on the entire cluster, even for local tables. And it always does it by sending RPCs (the node sends an RPC to itself too). Thus it fits in the remote class. If we want to add a possibility to "truncate locally only" and/or change the behavior for local tables, we can add a branch in `storage_proxy::truncate_blocking`. Refs: #11087	2022-08-04 12:12:43 +02:00
Kamil Braun	3e73de9a40	service: storage_proxy: introduce `is_alive` helper A helper is introduced both in `remote` and in `storage_proxy`. The `storage_proxy` one calls the `remote` one. In the future it will also handle a missing `remote`. Then it will report only the local node to be alive and other nodes dead while `remote` is missing. The change reduces the number of functions using the `_gossiper` field in `storage_proxy`.	2022-08-04 12:12:41 +02:00
Jenkins Promoter	0ce19e7812	release: prepare for 5.2.0-dev	2022-08-04 13:09:55 +03:00
Botond Dénes	df203a48af	Merge "Remove reconnectable_snitch_helper" from Pavel Emelyanov " The helper is in charge of receiving INTERNAL_IP app state from gossiper join/change notifications, updating system.peers with it and kicking messaging service to update its preferred ip cache along with initiating clients reconnection. Effectively this helper duplicates the topology tracking code in storage-service notifiers. Removing it makes less code and drops a bunch of unwanted cross-components dependencies, in particular: - one qctx call is gone - snitch (almost) no longer needs to get messaging from gossiper - public:private IP cache becomes local to messaging and can be moved to topology at low cost Some nice minor side effect -- this helper was left unsubscribed from gossiper on stop and snitch rename. Now its all gone. " * 'br-remove-reconnectible-snitch-helper-2' of https://github.com/xemul/scylla: snitch: Remove reconnectable snitch helper snitch, storage_service: Move reconnect to internal_ip kick snitch, storage_service: Move system.peers preferred_ip update snitch: Export prefer-local	2022-08-04 13:06:05 +03:00
Anna Stuchlik	532aa6e655	doc: update the links to Manager and Operator Closes #11196	2022-08-04 11:38:39 +03:00
Anna Stuchlik	143455d7ac	doc: rewording	2022-08-03 16:58:29 +02:00
Anna Stuchlik	f2af63ddd5	doc: update the links to fix the warnings	2022-08-03 15:12:41 +02:00
Anna Stuchlik	1d61550c64	doc: add the new page to the toctree	2022-08-03 15:03:48 +02:00
Anna Stuchlik	756b9a278f	doc: add the descrption of specifying workload attributes with service levels	2022-08-03 14:57:50 +02:00
Anna Stuchlik	2fa175a819	doc: add the definition of workloads to the glossary	2022-08-03 13:31:07 +02:00
Piotr Dulikowski	4f2adc14de	db/system_keyspace: fix indentation after previous patch	2022-08-03 13:19:19 +02:00
Piotr Dulikowski	eff8a6368c	db/system_keyspace: in system.local, use broadcast_rpc_address in rpc_address column Previously, the `system.local`'s `rpc_address` column kept local node's `rpc_address` from the scylla.yaml configuration. Although it sounds like it makes sense, there are a few reasons to change it to the value of scylla.yaml's `broadcast_rpc_address`: - The `broadcast_rpc_address` is the address that the drivers are supposed to connect to. `rpc_address` is the address that the node binds to - it can be set for example to 0.0.0.0 so that Scylla listens on all addresses, however this gives no useful information to the driver. - The `system.peers` table also has the `rpc_address` column and it already keeps other nodes' `broadcast_rpc_address`es. - Cassandra is going to do the same change in the upcoming version 4.1. Fixes: #11201	2022-08-03 13:19:03 +02:00
Kamil Braun	2aff2fea00	service: storage_proxy: remove `_messaging` reference All uses of `messaging_service&` have been moved to `remote`.	2022-08-02 19:55:12 +02:00
Kamil Braun	cf931c7863	service: storage_proxy: move `connection_dropped` to `remote`	2022-08-02 19:55:12 +02:00
Kamil Braun	2203d4fa09	service: storage_proxy: make `encode_replica_exception_for_rpc` a static function No need for this ugly template to be part of the `storage_proxy` header.	2022-08-02 19:55:12 +02:00
Kamil Braun	3499bc7731	service: storage_proxy: move `handle_write` to `remote` It is a helper used by `receive_mutation_handler` and `handle_paxos_learn`.	2022-08-02 19:55:12 +02:00
Kamil Braun	ba88ad8db0	service: storage_proxy: move `handle_paxos_prune` to `remote`	2022-08-02 19:55:12 +02:00
Kamil Braun	548767f91e	service: storage_proxy: move `handle_paxos_accept` to `remote`	2022-08-02 19:55:12 +02:00
Kamil Braun	807c7f32de	service: storage_proxy: move `handle_paxos_prepare` to `remote`	2022-08-02 19:55:12 +02:00
Kamil Braun	0e431e7c03	service: storage_proxy: move `handle_truncate` to `remote`	2022-08-02 19:55:12 +02:00
Kamil Braun	f8c1ba357f	service: storage_proxy: move `handle_read_digest` to `remote`	2022-08-02 19:55:12 +02:00
Kamil Braun	43997af40f	service: storage_proxy: move `handle_read_mutation_data` to `remote`	2022-08-02 19:55:12 +02:00
Kamil Braun	80586a0c7e	service: storage_proxy: move `handle_read_data` to `remote`	2022-08-02 19:55:12 +02:00
Kamil Braun	00c0ee44bd	service: storage_proxy: move `handle_mutation_failed` to `remote`	2022-08-02 19:55:12 +02:00
Kamil Braun	b9c436c6e0	service: storage_proxy: move `handle_mutation_done` to `remote`	2022-08-02 19:55:12 +02:00
Kamil Braun	178536d5d2	service: storage_proxy: move `handle_paxos_learn` to `remote`	2022-08-02 19:55:12 +02:00
Kamil Braun	f309886fac	service: storage_proxy: move `receive_mutation_handler` to `remote`	2022-08-02 19:55:12 +02:00
Kamil Braun	fad14d2094	service: storage_proxy: move `handle_counter_mutation` to `remote`	2022-08-02 19:55:12 +02:00
Kamil Braun	93325a220f	service: storage_proxy: remove `get_local_shared_storage_proxy` Its remaining uses are trivial to remove. Note: in `handle_counter_mutation` we had this piece of code: ``` }).then([trace_state_ptr = std::move(trace_state_ptr), &mutations, cl, timeout] { auto sp = get_local_shared_storage_proxy(); return sp->mutate_counters_on_leader(...); ``` Obtaining a `shared_ptr` to `storage_proxy` at this point is no different from obtaining a regular pointer: - The pointer is obtained inside `then` lambda body, not in the capture list. So if the goal of obtaining a `shared_ptr` here was to keep `storage_proxy` alive until the `then` lambda body is executed, that goal wasn't achieved because the pointer was obtained too late. - The `shared_ptr` is destroyed as soon as `mutate_counters_on_leader` returns, it's not stored anywhere. So it doesn't prolong the lifetime of the service. I replaced this with a simple capture of `this` in the lambda.	2022-08-02 19:55:12 +02:00
Kamil Braun	5148eafbd6	service: storage_proxy: (de)register RPC handlers in `remote`	2022-08-02 19:55:12 +02:00
Kamil Braun	f174645ab5	service: storage_proxy: introduce `remote` Move most accesses to `_messaging` to this struct (functions that send RPCs).	2022-08-02 19:55:10 +02:00
Pavel Emelyanov	ee0828b506	topology: Add local-dc detection shugar It's often needed to check if an endpoint sits in the same DC as the current node. It can be done by topo.get_datacenter() == topo.get_datacenter(endpoint) but in some cases a RAII filter function can be helpful. Also there's a db::count_local_endpoints() that is surprisingly in use, so add it to topology as well. Next patches will make use of both. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-07-30 17:58:45 +03:00
Anna Stuchlik	844c875f15	doc: add info about the time-consuming step due to resharding	2022-07-26 14:52:11 +02:00
Pavel Emelyanov	40d6ea973c	snitch: Remove reconnectable snitch helper It's now no-op Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-07-26 13:51:05 +03:00
Pavel Emelyanov	b91f7e9ec4	snitch, storage_service: Move reconnect to internal_ip kick The same thing as in previous patch -- when gossiper issues on_join/_change notification, storage service can kick messaging service to update its internal_ip cache and reconnect to the peer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-07-26 13:48:46 +03:00
Pavel Emelyanov	1bf8b0dd92	snitch, storage_service: Move system.peers preferred_ip update Currently the INTERNAL_IP state is updated using reconnectable helper by subscribing on on_join/on_change events from gossiper. The same subscription exists in storage service (it's a bit more elaborated by checking if the node is the part of the ring which is OK). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-07-26 13:48:46 +03:00
Pavel Emelyanov	0abd2c1e52	snitch: Export prefer-local The boolean bit says whether "the system" should prefer connecting to the address gossiper around via INTERNAL_IP. Currently only gossiping property file snitch allows to tune it and ec2-multiregion snitch prefers internal IP unconditionally. So exporting consists of 2 pieces: - add prefer_local() snitch method that's false by default or returns the (existing) _prefer_local bit for production snitch base - set the _prefer_local to true by ec2-multiregion snitch While at it the _prefer_local is moved to production_snitch_base for uniformity with the new prefer_local() call Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-07-26 13:48:04 +03:00
Anna Stuchlik	ff5c4a33f5	doc: add the new KB to the toctree	2022-07-25 14:29:33 +02:00
Anna Stuchlik	f1daef4b1b	doc: doc: add a KB about updating the mode in perftune.yaml after upgrade	2022-07-25 14:22:02 +02:00

1658 changed files with 96919 additions and 42326 deletions

24

.github/CODEOWNERS vendored

View File

@@ -12,7 +12,7 @@ test/cql/cdc_* @kbr- @elcallio @piodul @jul-stas
 test/boost/cdc_* @kbr- @elcallio @piodul @jul-stas
 # COMMITLOG / BATCHLOG
 db/commitlog/* @elcallio
 db/commitlog/* @elcallio @eliransin
 db/batch* @elcallio
 # COORDINATOR
@@ -25,7 +25,7 @@ compaction/* @raphaelsc @nyh
 transport/*
 # CQL QUERY LANGUAGE
 cql3/* @tgrabiec @psarna @cvybhu
 cql3/* @tgrabiec @cvybhu @nyh
 # COUNTERS
 counters* @jul-stas
@@ -33,7 +33,7 @@ tests/counter_test* @jul-stas
 # DOCS
 docs/* @annastuchlik @tzach
 docs/alternator @annastuchlik @tzach @nyh @psarna
 docs/alternator @annastuchlik @tzach @nyh @havaker @nuivall
 # GOSSIP
 gms/* @tgrabiec @asias
@@ -45,9 +45,9 @@ dist/docker/*
 utils/logalloc* @tgrabiec
 # MATERIALIZED VIEWS
 db/view/* @nyh @psarna
 cql3/statements/*view* @nyh @psarna
 test/boost/view_* @nyh @psarna
 db/view/* @nyh @cvybhu @piodul
 cql3/statements/*view* @nyh @cvybhu @piodul
 test/boost/view_* @nyh @cvybhu @piodul
 # PACKAGING
 dist/* @syuu1228
@@ -62,9 +62,9 @@ service/migration* @tgrabiec @nyh
 schema* @tgrabiec @nyh
 # SECONDARY INDEXES
 db/index/* @nyh @psarna
 cql3/statements/*index* @nyh @psarna
 test/boost/*index* @nyh @psarna
 index/* @nyh @cvybhu @piodul
 cql3/statements/*index* @nyh @cvybhu @piodul
 test/boost/*index* @nyh @cvybhu @piodul
 # SSTABLES
 sstables/* @tgrabiec @raphaelsc @nyh
@@ -74,11 +74,11 @@ streaming/* @tgrabiec @asias
 service/storage_service.* @tgrabiec @asias
 # ALTERNATOR
 alternator/* @nyh @psarna
 test/alternator/* @nyh @psarna
 alternator/* @nyh @havaker @nuivall
 test/alternator/* @nyh @havaker @nuivall
 # HINTED HANDOFF
 db/hints/* @piodul @vladzcloudius
 db/hints/* @piodul @vladzcloudius @eliransin
 # REDIS
 redis/* @nyh @syuu1228

									
										17

.github/workflows/docs-amplify-enhanced.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,17 @@

				name: "Docs / Amplify enhanced"

				on: issue_comment

				jobs:

				  build:

				    runs-on: ubuntu-latest

				    if: ${{ github.event.issue.pull_request }}

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v3

				        with:

				          fetch-depth: 0

				      - name: Amplify enhanced

				        env:

				          TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        uses: scylladb/sphinx-scylladb-theme/.github/actions/amplify-enhanced@master

									
										13

.github/workflows/docs-pages.yaml
									
										vendored
									
												View File
												
				@@ -2,10 +2,14 @@ name: "Docs / Publish"

				# For more information,

				# see https://sphinx-theme.scylladb.com/stable/deployment/production.html#available-workflows

				env:

				  FLAG: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'opensource' }}

				on:

				  push:

				    branches:

				      - master

				      - 'master'

				      - 'enterprise'

				    paths:

				      - "docs/**"

				  workflow_dispatch:

				@@ -24,12 +28,13 @@ jobs:

				        with:

				          python-version: 3.7

				      - name: Set up env

				        run: make -C docs setupenv

				        run: make -C docs FLAG="${{ env.FLAG }}" setupenv

				      - name: Build docs

				        run: make -C docs multiversion

				        run: make -C docs FLAG="${{ env.FLAG }}" multiversion

				      - name: Build redirects

				        run: make -C docs redirects

				        run: make -C docs FLAG="${{ env.FLAG }}" redirects

				      - name: Deploy docs to GitHub Pages

				        run: ./docs/_utils/deploy.sh

				        if: (github.ref_name == 'master' && env.FLAG == 'opensource') || (github.ref_name == 'enterprise' && env.FLAG == 'enterprise')

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

									
										8

.github/workflows/docs-pr.yaml
									
										vendored
									
												View File
												
				@@ -2,10 +2,14 @@ name: "Docs / Build PR"

				# For more information,

				# see https://sphinx-theme.scylladb.com/stable/deployment/production.html#available-workflows

				env:

				  FLAG: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'opensource' }}

				on:

				  pull_request:

				    branches:

				      - master

				      - enterprise

				    paths:

				      - "docs/**"

				@@ -23,6 +27,6 @@ jobs:

				        with:

				          python-version: 3.7

				      - name: Set up env

				        run: make -C docs setupenv

				        run: make -C docs FLAG="${{ env.FLAG }}" setupenv

				      - name: Build docs

				        run: make -C docs test

				        run: make -C docs FLAG="${{ env.FLAG }}" test

1

.gitignore vendored

View File

@@ -31,3 +31,4 @@ docs/poetry.lock
 compile_commands.json
 .ccls-cache/
 .mypy_cache
 .envrc

9

.gitmodules vendored

View File

@@ -6,12 +6,6 @@
 	path = swagger-ui
 	url = ../scylla-swagger-ui
 	ignore = dirty
 [submodule "libdeflate"]
 	path = libdeflate
 	url = ../libdeflate
 [submodule "abseil"]
 	path = abseil
 	url = ../abseil-cpp
 [submodule "scylla-jmx"]
 	path = tools/jmx
 	url = ../scylla-jmx
@@ -21,3 +15,6 @@
 [submodule "scylla-python3"]
 	path = tools/python3
 	url = ../scylla-python3
 [submodule "tools/cqlsh"]
 	path = tools/cqlsh
 	url = ../scylla-cqlsh

									
										877

CMakeLists.txt
									
												View File
												
				@@ -2,795 +2,200 @@ cmake_minimum_required(VERSION 3.18)

				project(scylla)

				if(NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES)

				  message(STATUS "Setting build type to 'Release' as none was specified.")

				  set(CMAKE_BUILD_TYPE "Release" CACHE

				      STRING "Choose the type of build." FORCE)

				  # Set the possible values of build type for cmake-gui

				  set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS

				    "Debug" "Release" "Dev" "Sanitize")

				endif()

				include(CTest)

				if(CMAKE_BUILD_TYPE)

				    string(TOLOWER "${CMAKE_BUILD_TYPE}" BUILD_TYPE)

				else()

				    set(BUILD_TYPE "release")

				endif()

				function(default_target_arch arch)

				    set(x86_instruction_sets i386 i686 x86_64)

				    if(CMAKE_SYSTEM_PROCESSOR IN_LIST x86_instruction_sets)

				        set(${arch} "westmere" PARENT_SCOPE)

				    elseif(CMAKE_SYSTEM_PROCESSOR EQUAL "aarch64")

				        set(${arch} "armv8-a+crc+crypto" PARENT_SCOPE)

				    else()

				        set(${arch} "" PARENT_SCOPE)

				    endif()

				endfunction()

				default_target_arch(target_arch)

				if(target_arch)

				    set(target_arch_flag "-march=${target_arch}")

				endif()

				set(cxx_coro_flag)

				if (CMAKE_CXX_COMPILER_ID MATCHES GNU)

				    set(cxx_coro_flag -fcoroutines)

				endif()

				list(APPEND CMAKE_MODULE_PATH

				  ${CMAKE_CURRENT_SOURCE_DIR}/cmake

				  ${CMAKE_CURRENT_SOURCE_DIR}/seastar/cmake)

				set(CMAKE_BUILD_TYPE "${CMAKE_BUILD_TYPE}" CACHE

				    STRING "Choose the type of build." FORCE)

				# Set the possible values of build type for cmake-gui

				set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS

				  "Debug" "Release" "Dev" "Sanitize")

				string(TOUPPER "${CMAKE_BUILD_TYPE}" build_mode)

				include(mode.${build_mode})

				include(mode.common)

				add_compile_definitions(

				    ${Seastar_DEFINITIONS_${build_mode}}

				    FMT_DEPRECATED_OSTREAM)

				include(limit_jobs)

				# Configure Seastar compile options to align with Scylla

				set(Seastar_CXX_FLAGS ${cxx_coro_flag} ${target_arch_flag} CACHE INTERNAL "" FORCE)

				set(Seastar_CXX_DIALECT gnu++20 CACHE INTERNAL "" FORCE)

				set(CMAKE_CXX_STANDARD "20" CACHE INTERNAL "")

				set(CMAKE_CXX_EXTENSIONS ON CACHE INTERNAL "")

				set(CMAKE_CXX_VISIBILITY_PRESET hidden)

				set(Seastar_TESTING ON CACHE BOOL "" FORCE)

				add_subdirectory(seastar)

				add_subdirectory(abseil)

				# Exclude absl::strerror from the default "all" target since it's not

				# used in Scylla build and, moreover, makes use of deprecated glibc APIs,

				# such as sys_nerr, which are not exposed from "stdio.h" since glibc 2.32,

				# which happens to be the case for recent Fedora distribution versions.

				#

				# Need to use the internal "absl_strerror" target name instead of namespaced

				# variant because `set_target_properties` does not understand the latter form,

				# unfortunately.

				set_target_properties(absl_strerror PROPERTIES EXCLUDE_FROM_ALL TRUE)

				# System libraries dependencies

				find_package(Boost COMPONENTS filesystem program_options system thread regex REQUIRED)

				find_package(Boost REQUIRED

				    COMPONENTS filesystem program_options system thread regex unit_test_framework)

				find_package(Lua REQUIRED)

				find_package(ZLIB REQUIRED)

				find_package(ICU COMPONENTS uc REQUIRED)

				find_package(ICU COMPONENTS uc i18n REQUIRED)

				find_package(absl COMPONENTS hash raw_hash_set REQUIRED)

				find_package(libdeflate REQUIRED)

				find_package(libxcrypt REQUIRED)

				find_package(Snappy REQUIRED)

				find_package(RapidJSON REQUIRED)

				find_package(Thrift REQUIRED)

				find_package(xxHash REQUIRED)

				set(scylla_build_dir "${CMAKE_BINARY_DIR}/build/${BUILD_TYPE}")

				set(scylla_gen_build_dir "${scylla_build_dir}/gen")

				file(MAKE_DIRECTORY "${scylla_build_dir}" "${scylla_gen_build_dir}")

				set(scylla_gen_build_dir "${CMAKE_BINARY_DIR}/gen")

				file(MAKE_DIRECTORY "${scylla_gen_build_dir}")

				# Place libraries, executables and archives in ${buildroot}/build/${mode}/

				foreach(mode RUNTIME LIBRARY ARCHIVE)

				    set(CMAKE_${mode}_OUTPUT_DIRECTORY "${scylla_build_dir}")

				endforeach()

				# Generate C++ source files from thrift definitions

				function(scylla_generate_thrift)

				    set(one_value_args TARGET VAR IN_FILE OUT_DIR SERVICE)

				    cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})

				    get_filename_component(in_file_name ${args_IN_FILE} NAME_WE)

				    set(aux_out_file_name ${args_OUT_DIR}/${in_file_name})

				    set(outputs

				        ${aux_out_file_name}_types.cpp

				        ${aux_out_file_name}_types.h

				        ${aux_out_file_name}_constants.cpp

				        ${aux_out_file_name}_constants.h

				        ${args_OUT_DIR}/${args_SERVICE}.cpp

				        ${args_OUT_DIR}/${args_SERVICE}.h)

				    add_custom_command(

				        DEPENDS

				            ${args_IN_FILE}

				            thrift

				        OUTPUT ${outputs}

				        COMMAND ${CMAKE_COMMAND} -E make_directory ${args_OUT_DIR}

				        COMMAND thrift -gen cpp:cob_style,no_skeleton -out "${args_OUT_DIR}" "${args_IN_FILE}")

				    add_custom_target(${args_TARGET}

				        DEPENDS ${outputs})

				    set(${args_VAR} ${outputs} PARENT_SCOPE)

				endfunction()

				scylla_generate_thrift(

				    TARGET scylla_thrift_gen_cassandra

				    VAR scylla_thrift_gen_cassandra_files

				    IN_FILE "${CMAKE_SOURCE_DIR}/interface/cassandra.thrift"

				    OUT_DIR ${scylla_gen_build_dir}

				    SERVICE Cassandra)

				# Parse antlr3 grammar files and generate C++ sources

				function(scylla_generate_antlr3)

				    set(one_value_args TARGET VAR IN_FILE OUT_DIR)

				    cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})

				    get_filename_component(in_file_pure_name ${args_IN_FILE} NAME)

				    get_filename_component(stem ${in_file_pure_name} NAME_WE)

				    set(outputs

				        "${args_OUT_DIR}/${stem}Lexer.hpp"

				        "${args_OUT_DIR}/${stem}Lexer.cpp"

				        "${args_OUT_DIR}/${stem}Parser.hpp"

				        "${args_OUT_DIR}/${stem}Parser.cpp")

				    add_custom_command(

				        DEPENDS

				            ${args_IN_FILE}

				        OUTPUT ${outputs}

				        # Remove #ifdef'ed code from the grammar source code

				        COMMAND sed -e "/^#if 0/,/^#endif/d" "${args_IN_FILE}" > "${args_OUT_DIR}/${in_file_pure_name}"

				        COMMAND antlr3 "${args_OUT_DIR}/${in_file_pure_name}"

				        # We replace many local `ExceptionBaseType* ex` variables with a single function-scope one.

				        # Because we add such a variable to every function, and because `ExceptionBaseType` is not a global

				        # name, we also add a global typedef to avoid compilation errors.

				        COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Lexer.hpp"

				        COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Lexer.cpp"

				        COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Parser.hpp"

				        COMMAND sed -i

				            -e "s/^\\( *\\)\\(ImplTraits::CommonTokenType\\* [a-zA-Z0-9_]* = NULL;\\)$/\\1const \\2/"

				            -e "/^.*On :.*$/d"

				            -e "1i using ExceptionBaseType = int;"

				            -e "s/^{/{ ExceptionBaseType\\* ex = nullptr;/; s/ExceptionBaseType\\* ex = new/ex = new/; s/exceptions::syntax_exception e/exceptions::syntax_exception\\& e/"

				            "${args_OUT_DIR}/${stem}Parser.cpp"

				        VERBATIM)

				    add_custom_target(${args_TARGET}

				        DEPENDS ${outputs})

				    set(${args_VAR} ${outputs} PARENT_SCOPE)

				endfunction()

				set(antlr3_grammar_files

				    cql3/Cql.g

				    alternator/expressions.g)

				set(antlr3_gen_files)

				foreach(f ${antlr3_grammar_files})

				    get_filename_component(grammar_file_name "${f}" NAME_WE)

				    get_filename_component(f_dir "${f}" DIRECTORY)

				    scylla_generate_antlr3(

				        TARGET scylla_antlr3_gen_${grammar_file_name}

				        VAR scylla_antlr3_gen_${grammar_file_name}_files

				        IN_FILE "${CMAKE_SOURCE_DIR}/${f}"

				        OUT_DIR ${scylla_gen_build_dir}/${f_dir})

				    list(APPEND antlr3_gen_files "${scylla_antlr3_gen_${grammar_file_name}_files}")

				endforeach()

				# Generate C++ sources from ragel grammar files

				seastar_generate_ragel(

				    TARGET scylla_ragel_gen_protocol_parser

				    VAR scylla_ragel_gen_protocol_parser_file

				    IN_FILE "${CMAKE_SOURCE_DIR}/redis/protocol_parser.rl"

				    OUT_FILE ${scylla_gen_build_dir}/redis/protocol_parser.hh)

				# Generate C++ sources from Swagger definitions

				set(swagger_files

				    api/api-doc/cache_service.json

				    api/api-doc/collectd.json

				    api/api-doc/column_family.json

				    api/api-doc/commitlog.json

				    api/api-doc/compaction_manager.json

				    api/api-doc/config.json

				    api/api-doc/endpoint_snitch_info.json

				    api/api-doc/error_injection.json

				    api/api-doc/failure_detector.json

				    api/api-doc/gossiper.json

				    api/api-doc/hinted_handoff.json

				    api/api-doc/lsa.json

				    api/api-doc/messaging_service.json

				    api/api-doc/storage_proxy.json

				    api/api-doc/storage_service.json

				    api/api-doc/stream_manager.json

				    api/api-doc/system.json

				    api/api-doc/utils.json)

				set(swagger_gen_files)

				foreach(f ${swagger_files})

				    get_filename_component(fname "${f}" NAME_WE)

				    get_filename_component(dir "${f}" DIRECTORY)

				    seastar_generate_swagger(

				        TARGET scylla_swagger_gen_${fname}

				        VAR scylla_swagger_gen_${fname}_files

				        IN_FILE "${CMAKE_SOURCE_DIR}/${f}"

				        OUT_DIR "${scylla_gen_build_dir}/${dir}")

				    list(APPEND swagger_gen_files "${scylla_swagger_gen_${fname}_files}")

				endforeach()

				# Create C++ bindings for IDL serializers

				function(scylla_generate_idl_serializer)

				    set(one_value_args TARGET VAR IN_FILE OUT_FILE)

				    cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})

				    get_filename_component(out_dir ${args_OUT_FILE} DIRECTORY)

				    set(idl_compiler "${CMAKE_SOURCE_DIR}/idl-compiler.py")

				    find_package(Python3 COMPONENTS Interpreter)

				    add_custom_command(

				        DEPENDS

				            ${args_IN_FILE}

				            ${idl_compiler}

				        OUTPUT ${args_OUT_FILE}

				        COMMAND ${CMAKE_COMMAND} -E make_directory ${out_dir}

				        COMMAND Python3::Interpreter ${idl_compiler} --ns ser -f ${args_IN_FILE} -o ${args_OUT_FILE})

				    add_custom_target(${args_TARGET}

				        DEPENDS ${args_OUT_FILE})

				    set(${args_VAR} ${args_OUT_FILE} PARENT_SCOPE)

				endfunction()

				set(idl_serializers

				    idl/cache_temperature.idl.hh

				    idl/commitlog.idl.hh

				    idl/consistency_level.idl.hh

				    idl/frozen_mutation.idl.hh

				    idl/frozen_schema.idl.hh

				    idl/gossip_digest.idl.hh

				    idl/hinted_handoff.idl.hh

				    idl/idl_test.idl.hh

				    idl/keys.idl.hh

				    idl/messaging_service.idl.hh

				    idl/mutation.idl.hh

				    idl/paging_state.idl.hh

				    idl/partition_checksum.idl.hh

				    idl/paxos.idl.hh

				    idl/query.idl.hh

				    idl/raft.idl.hh

				    idl/range.idl.hh

				    idl/read_command.idl.hh

				    idl/reconcilable_result.idl.hh

				    idl/replay_position.idl.hh

				    idl/result.idl.hh

				    idl/ring_position.idl.hh

				    idl/streaming.idl.hh

				    idl/token.idl.hh

				    idl/tracing.idl.hh

				    idl/truncation_record.idl.hh

				    idl/uuid.idl.hh

				    idl/view.idl.hh)

				set(idl_gen_files)

				foreach(f ${idl_serializers})

				    get_filename_component(idl_name "${f}" NAME)

				    get_filename_component(idl_target "${idl_name}" NAME_WE)

				    get_filename_component(idl_dir "${f}" DIRECTORY)

				    string(REPLACE ".idl.hh" ".dist.hh" idl_out_hdr_name "${idl_name}")

				    scylla_generate_idl_serializer(

				        TARGET scylla_idl_gen_${idl_target}

				        VAR scylla_idl_gen_${idl_target}_files

				        IN_FILE "${CMAKE_SOURCE_DIR}/${f}"

				        OUT_FILE ${scylla_gen_build_dir}/${idl_dir}/${idl_out_hdr_name})

				    list(APPEND idl_gen_files "${scylla_idl_gen_${idl_target}_files}")

				endforeach()

				set(scylla_sources

				add_library(scylla-main STATIC)

				target_sources(scylla-main

				  PRIVATE

				    absl-flat_hash_map.cc

				    alternator/auth.cc

				    alternator/conditions.cc

				    alternator/controller.cc

				    alternator/executor.cc

				    alternator/expressions.cc

				    alternator/serialization.cc

				    alternator/server.cc

				    alternator/stats.cc

				    alternator/streams.cc

				    api/api.cc

				    api/cache_service.cc

				    api/collectd.cc

				    api/column_family.cc

				    api/commitlog.cc

				    api/compaction_manager.cc

				    api/config.cc

				    api/endpoint_snitch.cc

				    api/error_injection.cc

				    api/failure_detector.cc

				    api/gossiper.cc

				    api/hinted_handoff.cc

				    api/lsa.cc

				    api/messaging_service.cc

				    api/storage_proxy.cc

				    api/storage_service.cc

				    api/stream_manager.cc

				    api/system.cc

				    atomic_cell.cc

				    auth/allow_all_authenticator.cc

				    auth/allow_all_authorizer.cc

				    auth/authenticated_user.cc

				    auth/authentication_options.cc

				    auth/authenticator.cc

				    auth/common.cc

				    auth/default_authorizer.cc

				    auth/password_authenticator.cc

				    auth/passwords.cc

				    auth/permission.cc

				    auth/permissions_cache.cc

				    auth/resource.cc

				    auth/role_or_anonymous.cc

				    auth/roles-metadata.cc

				    auth/sasl_challenge.cc

				    auth/service.cc

				    auth/standard_role_manager.cc

				    auth/transitional.cc

				    bytes.cc

				    caching_options.cc

				    canonical_mutation.cc

				    cdc/cdc_partitioner.cc

				    cdc/generation.cc

				    cdc/log.cc

				    cdc/metadata.cc

				    cdc/split.cc

				    client_data.cc

				    clocks-impl.cc

				    collection_mutation.cc

				    compaction/compaction.cc

				    compaction/compaction_manager.cc

				    compaction/compaction_strategy.cc

				    compaction/leveled_compaction_strategy.cc

				    compaction/size_tiered_compaction_strategy.cc

				    compaction/time_window_compaction_strategy.cc

				    compress.cc

				    converting_mutation_partition_applier.cc

				    counters.cc

				    cql3/abstract_marker.cc

				    cql3/attributes.cc

				    cql3/cf_name.cc

				    cql3/column_condition.cc

				    cql3/column_identifier.cc

				    cql3/column_specification.cc

				    cql3/constants.cc

				    cql3/cql3_type.cc

				    cql3/expr/expression.cc

				    cql3/expr/prepare_expr.cc

				    cql3/expr/restrictions.cc

				    cql3/functions/aggregate_fcts.cc

				    cql3/functions/castas_fcts.cc

				    cql3/functions/error_injection_fcts.cc

				    cql3/functions/functions.cc

				    cql3/functions/user_function.cc

				    cql3/index_name.cc

				    cql3/keyspace_element_name.cc

				    cql3/lists.cc

				    cql3/maps.cc

				    cql3/operation.cc

				    cql3/prepare_context.cc

				    cql3/query_options.cc

				    cql3/query_processor.cc

				    cql3/restrictions/statement_restrictions.cc

				    cql3/result_set.cc

				    cql3/role_name.cc

				    cql3/selection/abstract_function_selector.cc

				    cql3/selection/selectable.cc

				    cql3/selection/selection.cc

				    cql3/selection/selector.cc

				    cql3/selection/selector_factories.cc

				    cql3/selection/simple_selector.cc

				    cql3/sets.cc

				    cql3/statements/alter_keyspace_statement.cc

				    cql3/statements/alter_service_level_statement.cc

				    cql3/statements/alter_table_statement.cc

				    cql3/statements/alter_type_statement.cc

				    cql3/statements/alter_view_statement.cc

				    cql3/statements/attach_service_level_statement.cc

				    cql3/statements/authentication_statement.cc

				    cql3/statements/authorization_statement.cc

				    cql3/statements/batch_statement.cc

				    cql3/statements/cas_request.cc

				    cql3/statements/cf_prop_defs.cc

				    cql3/statements/cf_statement.cc

				    cql3/statements/create_aggregate_statement.cc

				    cql3/statements/create_function_statement.cc

				    cql3/statements/create_index_statement.cc

				    cql3/statements/create_keyspace_statement.cc

				    cql3/statements/create_service_level_statement.cc

				    cql3/statements/create_table_statement.cc

				    cql3/statements/create_type_statement.cc

				    cql3/statements/create_view_statement.cc

				    cql3/statements/delete_statement.cc

				    cql3/statements/detach_service_level_statement.cc

				    cql3/statements/drop_aggregate_statement.cc

				    cql3/statements/drop_function_statement.cc

				    cql3/statements/drop_index_statement.cc

				    cql3/statements/drop_keyspace_statement.cc

				    cql3/statements/drop_service_level_statement.cc

				    cql3/statements/drop_table_statement.cc

				    cql3/statements/drop_type_statement.cc

				    cql3/statements/drop_view_statement.cc

				    cql3/statements/function_statement.cc

				    cql3/statements/grant_statement.cc

				    cql3/statements/index_prop_defs.cc

				    cql3/statements/index_target.cc

				    cql3/statements/ks_prop_defs.cc

				    cql3/statements/list_permissions_statement.cc

				    cql3/statements/list_service_level_attachments_statement.cc

				    cql3/statements/list_service_level_statement.cc

				    cql3/statements/list_users_statement.cc

				    cql3/statements/modification_statement.cc

				    cql3/statements/permission_altering_statement.cc

				    cql3/statements/property_definitions.cc

				    cql3/statements/raw/parsed_statement.cc

				    cql3/statements/revoke_statement.cc

				    cql3/statements/role-management-statements.cc

				    cql3/statements/schema_altering_statement.cc

				    cql3/statements/select_statement.cc

				    cql3/statements/service_level_statement.cc

				    cql3/statements/sl_prop_defs.cc

				    cql3/statements/truncate_statement.cc

				    cql3/statements/update_statement.cc

				    cql3/statements/use_statement.cc

				    cql3/type_json.cc

				    cql3/untyped_result_set.cc

				    cql3/update_parameters.cc

				    cql3/user_types.cc

				    cql3/util.cc

				    cql3/ut_name.cc

				    cql3/values.cc

				    data_dictionary/data_dictionary.cc

				    db/batchlog_manager.cc

				    db/commitlog/commitlog.cc

				    db/commitlog/commitlog_entry.cc

				    db/commitlog/commitlog_replayer.cc

				    db/config.cc

				    db/consistency_level.cc

				    db/cql_type_parser.cc

				    db/data_listeners.cc

				    db/extensions.cc

				    db/heat_load_balance.cc

				    db/hints/host_filter.cc

				    db/hints/manager.cc

				    db/hints/resource_manager.cc

				    db/hints/sync_point.cc

				    db/large_data_handler.cc

				    db/legacy_schema_migrator.cc

				    db/marshal/type_parser.cc

				    db/rate_limiter.cc

				    db/schema_tables.cc

				    db/size_estimates_virtual_reader.cc

				    db/snapshot-ctl.cc

				    db/sstables-format-selector.cc

				    db/system_distributed_keyspace.cc

				    db/system_keyspace.cc

				    db/view/row_locking.cc

				    db/view/view.cc

				    db/view/view_update_generator.cc

				    db/virtual_table.cc

				    dht/boot_strapper.cc

				    dht/i_partitioner.cc

				    dht/murmur3_partitioner.cc

				    dht/range_streamer.cc

				    dht/token.cc

				    replica/distributed_loader.cc

				    direct_failure_detector/failure_detector.cc

				    duration.cc

				    exceptions/exceptions.cc

				    readers/mutation_readers.cc

				    frozen_mutation.cc

				    frozen_schema.cc

				    generic_server.cc

				    gms/application_state.cc

				    gms/endpoint_state.cc

				    gms/failure_detector.cc

				    gms/feature_service.cc

				    gms/gossip_digest_ack2.cc

				    gms/gossip_digest_ack.cc

				    gms/gossip_digest_syn.cc

				    gms/gossiper.cc

				    gms/inet_address.cc

				    gms/versioned_value.cc

				    gms/version_generator.cc

				    hashers.cc

				    index/secondary_index.cc

				    index/secondary_index_manager.cc

				    debug.cc

				    init.cc

				    keys.cc

				    utils/lister.cc

				    locator/abstract_replication_strategy.cc

				    locator/azure_snitch.cc

				    locator/ec2_multi_region_snitch.cc

				    locator/ec2_snitch.cc

				    locator/everywhere_replication_strategy.cc

				    locator/gce_snitch.cc

				    locator/gossiping_property_file_snitch.cc

				    locator/local_strategy.cc

				    locator/network_topology_strategy.cc

				    locator/production_snitch_base.cc

				    locator/rack_inferring_snitch.cc

				    locator/simple_snitch.cc

				    locator/simple_strategy.cc

				    locator/snitch_base.cc

				    locator/token_metadata.cc

				    lang/lua.cc

				    main.cc

				    replica/memtable.cc

				    message/messaging_service.cc

				    multishard_mutation_query.cc

				    mutation.cc

				    mutation_fragment.cc

				    mutation_partition.cc

				    mutation_partition_serializer.cc

				    mutation_partition_view.cc

				    mutation_query.cc

				    readers/mutation_reader.cc

				    mutation_writer/feed_writers.cc

				    mutation_writer/multishard_writer.cc

				    mutation_writer/partition_based_splitting_writer.cc

				    mutation_writer/shard_based_splitting_writer.cc

				    mutation_writer/timestamp_based_splitting_writer.cc

				    partition_slice_builder.cc

				    partition_version.cc

				    querier.cc

				    query.cc

				    query_ranges_to_vnodes.cc

				    query-result-set.cc

				    raft/fsm.cc

				    raft/log.cc

				    raft/raft.cc

				    raft/server.cc

				    raft/tracker.cc

				    range_tombstone.cc

				    range_tombstone_list.cc

				    tombstone_gc_options.cc

				    tombstone_gc.cc

				    reader_concurrency_semaphore.cc

				    redis/abstract_command.cc

				    redis/command_factory.cc

				    redis/commands.cc

				    redis/keyspace_utils.cc

				    redis/lolwut.cc

				    redis/mutation_utils.cc

				    redis/options.cc

				    redis/query_processor.cc

				    redis/query_utils.cc

				    redis/server.cc

				    redis/service.cc

				    redis/stats.cc

				    release.cc

				    repair/repair.cc

				    repair/row_level.cc

				    replica/database.cc

				    replica/table.cc

				    row_cache.cc

				    schema.cc

				    schema_mutations.cc

				    schema_registry.cc

				    serializer.cc

				    service/client_state.cc

				    service/forward_service.cc

				    service/migration_manager.cc

				    service/misc_services.cc

				    service/pager/paging_state.cc

				    service/pager/query_pagers.cc

				    service/paxos/paxos_state.cc

				    service/paxos/prepare_response.cc

				    service/paxos/prepare_summary.cc

				    service/paxos/proposal.cc

				    service/priority_manager.cc

				    service/qos/qos_common.cc

				    service/qos/service_level_controller.cc

				    service/qos/standard_service_level_distributed_data_accessor.cc

				    service/raft/raft_group_registry.cc

				    service/raft/raft_rpc.cc

				    service/raft/raft_sys_table_storage.cc

				    service/raft/group0_state_machine.cc

				    service/storage_proxy.cc

				    service/storage_service.cc

				    sstables/compress.cc

				    sstables/integrity_checked_file_impl.cc

				    sstables/kl/reader.cc

				    sstables/metadata_collector.cc

				    sstables/m_format_read_helpers.cc

				    sstables/mx/reader.cc

				    sstables/mx/writer.cc

				    sstables/prepended_input_stream.cc

				    sstables/random_access_reader.cc

				    sstables/sstable_directory.cc

				    sstables/sstable_mutation_reader.cc

				    sstables/sstables.cc

				    sstables/sstable_set.cc

				    sstables/sstables_manager.cc

				    sstables/sstable_version.cc

				    sstables/writer.cc

				    streaming/consumer.cc

				    streaming/progress_info.cc

				    streaming/session_info.cc

				    streaming/stream_coordinator.cc

				    streaming/stream_manager.cc

				    streaming/stream_plan.cc

				    streaming/stream_reason.cc

				    streaming/stream_receive_task.cc

				    streaming/stream_request.cc

				    streaming/stream_result_future.cc

				    streaming/stream_session.cc

				    streaming/stream_session_state.cc

				    streaming/stream_summary.cc

				    streaming/stream_task.cc

				    streaming/stream_transfer_task.cc

				    sstables_loader.cc

				    table_helper.cc

				    thrift/controller.cc

				    thrift/handler.cc

				    thrift/server.cc

				    thrift/thrift_validation.cc

				    tasks/task_manager.cc

				    timeout_config.cc

				    tools/scylla-sstable-index.cc

				    tools/scylla-types.cc

				    tracing/traced_file.cc

				    tracing/trace_keyspace_helper.cc

				    tracing/trace_state.cc

				    tracing/tracing_backend_registry.cc

				    tracing/tracing.cc

				    transport/controller.cc

				    transport/cql_protocol_extension.cc

				    transport/event.cc

				    transport/event_notifier.cc

				    transport/messages/result_message.cc

				    transport/server.cc

				    types.cc

				    unimplemented.cc

				    utils/arch/powerpc/crc32-vpmsum/crc32_wrapper.cc

				    utils/array-search.cc

				    utils/ascii.cc

				    utils/base64.cc

				    utils/big_decimal.cc

				    utils/bloom_calculations.cc

				    utils/bloom_filter.cc

				    utils/buffer_input_stream.cc

				    utils/build_id.cc

				    utils/config_file.cc

				    utils/directories.cc

				    utils/disk-error-handler.cc

				    utils/dynamic_bitset.cc

				    utils/error_injection.cc

				    utils/exceptions.cc

				    utils/file_lock.cc

				    utils/generation-number.cc

				    utils/gz/crc_combine.cc

				    utils/gz/gen_crc_combine_table.cc

				    utils/human_readable.cc

				    utils/i_filter.cc

				    utils/large_bitset.cc

				    utils/like_matcher.cc

				    utils/limiting_data_source.cc

				    utils/logalloc.cc

				    utils/managed_bytes.cc

				    utils/multiprecision_int.cc

				    utils/murmur_hash.cc

				    utils/rate_limiter.cc

				    utils/rjson.cc

				    utils/runtime.cc

				    utils/updateable_value.cc

				    utils/utf8.cc

				    utils/uuid.cc

				    utils/UUID_gen.cc

				    validation.cc

				    vint-serialization.cc

				    zstd.cc)

				set(scylla_gen_sources

				    "${scylla_thrift_gen_cassandra_files}"

				    "${scylla_ragel_gen_protocol_parser_file}"

				    "${swagger_gen_files}"

				    "${idl_gen_files}"

				    "${antlr3_gen_files}")

				target_link_libraries(scylla-main

				  PRIVATE

				    db

				    absl::hash

				    absl::raw_hash_set

				    Seastar::seastar

				    Snappy::snappy

				    systemd

				    ZLIB::ZLIB)

				add_subdirectory(api)

				add_subdirectory(alternator)

				add_subdirectory(db)

				add_subdirectory(auth)

				add_subdirectory(cdc)

				add_subdirectory(compaction)

				add_subdirectory(cql3)

				add_subdirectory(data_dictionary)

				add_subdirectory(dht)

				add_subdirectory(gms)

				add_subdirectory(idl)

				add_subdirectory(index)

				add_subdirectory(interface)

				add_subdirectory(lang)

				add_subdirectory(locator)

				add_subdirectory(mutation)

				add_subdirectory(mutation_writer)

				add_subdirectory(readers)

				add_subdirectory(redis)

				add_subdirectory(replica)

				add_subdirectory(raft)

				add_subdirectory(repair)

				add_subdirectory(rust)

				add_subdirectory(schema)

				add_subdirectory(service)

				add_subdirectory(sstables)

				add_subdirectory(streaming)

				add_subdirectory(test)

				add_subdirectory(thrift)

				add_subdirectory(tools)

				add_subdirectory(tracing)

				add_subdirectory(transport)

				add_subdirectory(types)

				add_subdirectory(utils)

				include(add_version_library)

				add_version_library(scylla_version

				    release.cc)

				add_executable(scylla

				    ${scylla_sources}

				    ${scylla_gen_sources})

				  main.cc)

				target_link_libraries(scylla PRIVATE

				    scylla-main

				    api

				    auth

				    alternator

				    db

				    cdc

				    compaction

				    cql3

				    data_dictionary

				    dht

				    gms

				    idl

				    index

				    lang

				    locator

				    mutation

				    mutation_writer

				    raft

				    readers

				    redis

				    repair

				    replica

				    schema

				    scylla_version

				    service

				    sstables

				    streaming

				    test-perf

				    thrift

				    tools

				    tracing

				    transport

				    types

				    utils)

				target_link_libraries(Boost::regex

				  INTERFACE

				    ICU::i18n

				    ICU::uc)

				target_link_libraries(scylla PRIVATE

				    seastar

				    # Boost dependencies

				    Boost::filesystem

				    Boost::program_options

				    Boost::system

				    Boost::thread

				    Boost::regex

				    Boost::headers

				    # Abseil libs

				    absl::hashtablez_sampler

				    absl::raw_hash_set

				    absl::synchronization

				    absl::graphcycles_internal

				    absl::stacktrace

				    absl::symbolize

				    absl::debugging_internal

				    absl::demangle_internal

				    absl::time

				    absl::time_zone

				    absl::int128

				    absl::city

				    absl::hash

				    absl::malloc_internal

				    absl::spinlock_wait

				    absl::base

				    absl::dynamic_annotations

				    absl::raw_logging_internal

				    absl::exponential_biased

				    absl::throw_delegate

				    # System libs

				    ZLIB::ZLIB

				    ICU::uc

				    systemd

				    zstd

				    snappy

				    ${LUA_LIBRARIES}

				    thrift

				    crypt)

				    Boost::program_options)

				# Force SHA1 build-id generation

				set(default_linker_flags "-Wl,--build-id=sha1")

				include(CheckLinkerFlag)

				foreach(linker "lld" "gold")

				    set(linker_flag "-fuse-ld=${linker}")

				    check_linker_flag(CXX ${linker_flag} "CXX_LINKER_HAVE_${linker}")

				    if(CXX_LINKER_HAVE_${linker})

				        string(APPEND default_linker_flags " ${linker_flag}")

				        break()

				    endif()

				endforeach()

				set(CMAKE_EXE_LINKER_FLAGS "${default_linker_flags}" CACHE INTERNAL "")

				target_link_libraries(scylla PRIVATE

				    -Wl,--build-id=sha1 # Force SHA1 build-id generation

				    # TODO: Use lld linker if it's available, otherwise gold, else bfd

				    -fuse-ld=lld)

				# TODO: patch dynamic linker to match configure.py behavior

				target_compile_options(scylla PRIVATE

				    -std=gnu++20

				    ${cxx_coro_flag}

				    ${target_arch_flag})

				# Hacks needed to expose internal APIs for xxhash dependencies

				target_compile_definitions(scylla PRIVATE XXH_PRIVATE_API HAVE_LZ4_COMPRESS_DEFAULT)

				target_include_directories(scylla PRIVATE

				    "${CMAKE_CURRENT_SOURCE_DIR}"

				    libdeflate

				    abseil

				    "${scylla_gen_build_dir}")

				###

				### Create crc_combine_table helper executable.

				### Use it to generate crc_combine_table.cc to be used in scylla at build time.

				###

				add_executable(crc_combine_table utils/gz/gen_crc_combine_table.cc)

				target_link_libraries(crc_combine_table PRIVATE seastar)

				target_include_directories(crc_combine_table PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}")

				target_compile_options(crc_combine_table PRIVATE

				    -std=gnu++20

				    ${cxx_coro_flag}

				    ${target_arch_flag})

				add_dependencies(scylla crc_combine_table)

				# Generate an additional source file at build time that is needed for Scylla compilation

				add_custom_command(OUTPUT "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc"

				    COMMAND $<TARGET_FILE:crc_combine_table> > "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc"

				    DEPENDS crc_combine_table)

				target_sources(scylla PRIVATE "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc")

				###

				### Generate version file and supply appropriate compile definitions for release.cc

				###

				execute_process(COMMAND ${CMAKE_SOURCE_DIR}/SCYLLA-VERSION-GEN --output-dir "${CMAKE_BINARY_DIR}/gen" RESULT_VARIABLE scylla_version_gen_res)

				if(scylla_version_gen_res)

				    message(SEND_ERROR "Version file generation failed. Return code: ${scylla_version_gen_res}")

				endif()

				file(READ "${CMAKE_BINARY_DIR}/gen/SCYLLA-VERSION-FILE" scylla_version)

				string(STRIP "${scylla_version}" scylla_version)

				file(READ "${CMAKE_BINARY_DIR}/gen/SCYLLA-RELEASE-FILE" scylla_release)

				string(STRIP "${scylla_release}" scylla_release)

				get_property(release_cdefs SOURCE "${CMAKE_SOURCE_DIR}/release.cc" PROPERTY COMPILE_DEFINITIONS)

				list(APPEND release_cdefs "SCYLLA_VERSION=\"${scylla_version}\"" "SCYLLA_RELEASE=\"${scylla_release}\"")

				set_source_files_properties("${CMAKE_SOURCE_DIR}/release.cc" PROPERTIES COMPILE_DEFINITIONS "${release_cdefs}")

				###

				### Custom command for building libdeflate. Link the library to scylla.

				###

				set(libdeflate_lib "${scylla_build_dir}/libdeflate/libdeflate.a")

				add_custom_command(OUTPUT "${libdeflate_lib}"

				    COMMAND make -C "${CMAKE_SOURCE_DIR}/libdeflate"

				        BUILD_DIR=../build/${BUILD_TYPE}/libdeflate/

				        CC=${CMAKE_C_COMPILER}

				        "CFLAGS=${target_arch_flag}"

				        ../build/${BUILD_TYPE}/libdeflate//libdeflate.a) # Two backslashes are important!

				# Hack to force generating custom command to produce libdeflate.a

				add_custom_target(libdeflate DEPENDS "${libdeflate_lib}")

				target_link_libraries(scylla PRIVATE "${libdeflate_lib}")

				# TODO: create cmake/ directory and move utilities (generate functions etc) there

				# TODO: Build tests if BUILD_TESTING=on (using CTest module)

									
										2

CONTRIBUTING.md
									
												View File
												
				@@ -2,7 +2,7 @@

				## Asking questions or requesting help

				Use the [Scylla Users mailing list](https://groups.google.com/g/scylladb-users) or the [Slack workspace](http://slack.scylladb.com) for general questions and help.

				Use the [ScyllaDB Community Forum](https://forum.scylladb.com) or the [Slack workspace](http://slack.scylladb.com) for general questions and help.

				Join the [Scylla Developers mailing list](https://groups.google.com/g/scylladb-dev) for deeper technical discussions and to discuss your ideas for contributions.

									
										2

HACKING.md
									
												View File
												
				@@ -195,7 +195,7 @@ $ # Edit configuration options as appropriate

				$ SCYLLA_HOME=$HOME/scylla build/release/scylla

				```

				The `scylla.yaml` file in the repository by default writes all database data to `/var/lib/scylla`, which likely requires root access. Change the `data_file_directories` and `commitlog_directory` fields as appropriate.

				The `scylla.yaml` file in the repository by default writes all database data to `/var/lib/scylla`, which likely requires root access. Change the `data_file_directories`, `commitlog_directory` and `schema_commitlog_directory` fields as appropriate.

				Scylla has a number of requirements for the file-system and operating system to operate ideally and at peak performance. However, during development, these requirements can be relaxed with the `--developer-mode` flag.

									
										12

README.md
									
												View File
												
				@@ -30,9 +30,9 @@ requirements - you just need to meet the frozen toolchain's prerequisites

				Building Scylla with the frozen toolchain `dbuild` is as easy as:

				```bash

				$ git submodule update --init --force --recursive

				$ ./tools/toolchain/dbuild ./configure.py

				$ ./tools/toolchain/dbuild ninja build/release/scylla

				$ git submodule update --init --force --recursive

				$ ./tools/toolchain/dbuild ./configure.py

				$ ./tools/toolchain/dbuild ninja build/release/scylla

				```

				For further information, please see:

				@@ -60,7 +60,7 @@ Please note that you need to run Scylla with `dbuild` if you built it with the f

				For more run options, run:

				```bash

				$ ./tools/toolchain/dbuild ./build/release/scylla --help

				$ ./tools/toolchain/dbuild ./build/release/scylla --help

				```

				## Testing

				@@ -100,10 +100,10 @@ If you are a developer working on Scylla, please read the [developer guidelines]

				## Contact

				* The [users mailing list] and [Slack channel] are for users to discuss configuration, management, and operations of the ScyllaDB open source.

				* The [community forum] and [Slack channel] are for users to discuss configuration, management, and operations of the ScyllaDB open source.

				* The [developers mailing list] is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

				[Users mailing list]: https://groups.google.com/forum/#!forum/scylladb-users

				[Community forum]: https://forum.scylladb.com/

				[Slack channel]: http://slack.scylladb.com/

39

SCYLLA-VERSION-GEN

View File

@@ -1,11 +1,12 @@
 #!/bin/sh
 USAGE=$(cat <<-END
 Usage: $(basename "$0") [-h|--help] [-o|--output-dir PATH] -- generate Scylla version and build information files.
 Usage: $(basename "$0") [-h|--help] [-o|--output-dir PATH] [--date-stamp DATE] -- generate Scylla version and build information files.
 Options:
   -h|--help show this help message.
   -o|--output-dir PATH specify destination path at which the version files are to be created.
   -d|--date-stamp DATE manually set date for release parameter
 By default, the script will attempt to parse 'version' file
 in the current directory, which should contain a string of
@@ -31,7 +32,9 @@ using '-o PATH' option.
 END
 )
 while [[ $# -gt 0 ]]; do
 DATE=""
 while [ $# -gt 0 ]; do
 	opt="$1"
 	case $opt in
 		-h|--help)
@@ -43,6 +46,11 @@ while [[ $# -gt 0 ]]; do
 			shift
 			shift
 			;;
 		--date-stamp)
 			DATE="$2"
 			shift
 			shift
 			;;
 		*)
 			echo "Unexpected argument found: $1"
 			echo
@@ -58,24 +66,33 @@ if [ -z "$OUTPUT_DIR" ]; then
 	OUTPUT_DIR="$SCRIPT_DIR/build"
 fi
 if [ -z "$DATE" ]; then
   DATE=$(date --utc +%Y%m%d)
 fi
 # Default scylla product/version tags
 PRODUCT=scylla
 VERSION=5.1.0-dev
 VERSION=5.3.0-dev
 if test -f version
 then
 	SCYLLA_VERSION=$(cat version | awk -F'-' '{print $1}')
 	SCYLLA_RELEASE=$(cat version | awk -F'-' '{print $2}')
 else
 	DATE=$(date --utc +%Y%m%d)
 	GIT_COMMIT=$(git -C "$SCRIPT_DIR" log --pretty=format:'%h' -n 1 --abbrev=12)
 	SCYLLA_VERSION=$VERSION
 	# For custom package builds, replace "0" with "counter.your_name",
 	# where counter starts at 1 and increments for successive versions.
 	# This ensures that the package manager will select your custom
 	# package over the standard release.
 	SCYLLA_BUILD=0
 	SCYLLA_RELEASE=$SCYLLA_BUILD.$DATE.$GIT_COMMIT
 	if [ -z "$SCYLLA_RELEASE" ]; then
 		DATE=$(date --utc +%Y%m%d)
 		GIT_COMMIT=$(git -C "$SCRIPT_DIR" log --pretty=format:'%h' -n 1 --abbrev=12)
 		# For custom package builds, replace "0" with "counter.your_name",
 		# where counter starts at 1 and increments for successive versions.
 		# This ensures that the package manager will select your custom
 		# package over the standard release.
 		SCYLLA_BUILD=0
 		SCYLLA_RELEASE=$SCYLLA_BUILD.$DATE.$GIT_COMMIT
 	elif [ -f "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" ]; then
 		echo "setting SCYLLA_RELEASE only makes sense in clean builds" 1>&2
 		exit 1
 	fi
 fi
 if [ -f "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" ]; then

1

abseil

Submodule abseil deleted from 9e408e050f

									
										30

alternator/CMakeLists.txt
									
										Normal file
									
												View File
												
				@@ -0,0 +1,30 @@

				include(generate_cql_grammar)

				generate_cql_grammar(

				  GRAMMAR expressions.g

				  SOURCES cql_grammar_srcs)

				add_library(alternator STATIC)

				target_sources(alternator

				  PRIVATE

				    controller.cc

				    server.cc

				    executor.cc

				    stats.cc

				    serialization.cc

				    expressions.cc

				    conditions.cc

				    auth.cc

				    streams.cc

				    ttl.cc

				    ${cql_grammar_srcs})

				target_include_directories(alternator

				  PUBLIC

				    ${CMAKE_SOURCE_DIR}

				    ${CMAKE_BINARY_DIR}

				  PRIVATE

				    ${RAPIDJSON_INCLUDE_DIRS})

				target_link_libraries(alternator

				  cql3

				  idl

				  Seastar::seastar

				  xxHash::xxhash)

									
										100

alternator/auth.cc
									
												View File
												
				@@ -10,8 +10,6 @@

				#include "log.hh"

				#include <string>

				#include <string_view>

				#include <gnutls/crypto.h>

				#include "hashers.hh"

				#include "bytes.hh"

				#include "alternator/auth.hh"

				#include <fmt/format.h>

				@@ -29,99 +27,6 @@ namespace alternator {

				static logging::logger alogger("alternator-auth");

				static hmac_sha256_digest hmac_sha256(std::string_view key, std::string_view msg) {

				    hmac_sha256_digest digest;

				    int ret = gnutls_hmac_fast(GNUTLS_MAC_SHA256, key.data(), key.size(), msg.data(), msg.size(), digest.data());

				    if (ret) {

				        throw std::runtime_error(fmt::format("Computing HMAC failed ({}): {}", ret, gnutls_strerror(ret)));

				    }

				    return digest;

				}

				static hmac_sha256_digest get_signature_key(std::string_view key, std::string_view date_stamp, std::string_view region_name, std::string_view service_name) {

				    auto date = hmac_sha256("AWS4" + std::string(key), date_stamp);

				    auto region = hmac_sha256(std::string_view(date.data(), date.size()), region_name);

				    auto service = hmac_sha256(std::string_view(region.data(), region.size()), service_name);

				    auto signing = hmac_sha256(std::string_view(service.data(), service.size()), "aws4_request");

				    return signing;

				}

				static std::string apply_sha256(std::string_view msg) {

				    sha256_hasher hasher;

				    hasher.update(msg.data(), msg.size());

				    return to_hex(hasher.finalize());

				}

				static std::string apply_sha256(const std::vector<temporary_buffer<char>>& msg) {

				    sha256_hasher hasher;

				    for (const temporary_buffer<char>& buf : msg) {

				        hasher.update(buf.get(), buf.size());

				    }

				    return to_hex(hasher.finalize());

				}

				static std::string format_time_point(db_clock::time_point tp) {

				    time_t time_point_repr = db_clock::to_time_t(tp);

				    std::string time_point_str;

				    time_point_str.resize(17);

				    ::tm time_buf;

				    // strftime prints the terminating null character as well

				    std::strftime(time_point_str.data(), time_point_str.size(), "%Y%m%dT%H%M%SZ", ::gmtime_r(&time_point_repr, &time_buf));

				    time_point_str.resize(16);

				    return time_point_str;

				}

				void check_expiry(std::string_view signature_date) {

				    //FIXME: The default 15min can be changed with X-Amz-Expires header - we should honor it

				    std::string expiration_str = format_time_point(db_clock::now() - 15min);

				    std::string validity_str = format_time_point(db_clock::now() + 15min);

				    if (signature_date < expiration_str) {

				        throw api_error::invalid_signature(

				                fmt::format("Signature expired: {} is now earlier than {} (current time - 15 min.)",

				                signature_date, expiration_str));

				    }

				    if (signature_date > validity_str) {

				        throw api_error::invalid_signature(

				                fmt::format("Signature not yet current: {} is still later than {} (current time + 15 min.)",

				                signature_date, validity_str));

				    }

				}

				std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,

				        std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,

				        const std::vector<temporary_buffer<char>>& body_content, std::string_view region, std::string_view service, std::string_view query_string) {

				    auto amz_date_it = signed_headers_map.find("x-amz-date");

				    if (amz_date_it == signed_headers_map.end()) {

				        throw api_error::invalid_signature("X-Amz-Date header is mandatory for signature verification");

				    }

				    std::string_view amz_date = amz_date_it->second;

				    check_expiry(amz_date);

				    std::string_view datestamp = amz_date.substr(0, 8);

				    if (datestamp != orig_datestamp) {

				        throw api_error::invalid_signature(

				                format("X-Amz-Date date does not match the provided datestamp. Expected {}, got {}",

				                        orig_datestamp, datestamp));

				    }

				    std::string_view canonical_uri = "/";

				    std::stringstream canonical_headers;

				    for (const auto& header : signed_headers_map) {

				        canonical_headers << fmt::format("{}:{}", header.first, header.second) << '\n';

				    }

				    std::string payload_hash = apply_sha256(body_content);

				    std::string canonical_request = fmt::format("{}\n{}\n{}\n{}\n{}\n{}", method, canonical_uri, query_string, canonical_headers.str(), signed_headers_str, payload_hash);

				    std::string_view algorithm = "AWS4-HMAC-SHA256";

				    std::string credential_scope = fmt::format("{}/{}/{}/aws4_request", datestamp, region, service);

				    std::string string_to_sign = fmt::format("{}\n{}\n{}\n{}", algorithm, amz_date, credential_scope,  apply_sha256(canonical_request));

				    hmac_sha256_digest signing_key = get_signature_key(secret_access_key, datestamp, region, service);

				    hmac_sha256_digest signature = hmac_sha256(std::string_view(signing_key.data(), signing_key.size()), string_to_sign);

				    return to_hex(bytes_view(reinterpret_cast<const int8_t*>(signature.data()), signature.size()));

				}

				future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::string username) {

				    schema_ptr schema = proxy.data_dictionary().find_schema("system_auth", "roles");

				    partition_key pk = partition_key::from_single_value(*schema, utf8_type->decompose(username));

				@@ -133,14 +38,15 @@ future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::strin

				    }

				    auto selection = cql3::selection::selection::for_columns(schema, {salted_hash_col});

				    auto partition_slice = query::partition_slice(std::move(bounds), {}, query::column_id_vector{salted_hash_col->id}, selection->get_query_options());

				    auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, proxy.get_max_result_size(partition_slice));

				    auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice,

				            proxy.get_max_result_size(partition_slice), query::tombstone_limit(proxy.get_tombstone_limit()));

				    auto cl = auth::password_authenticator::consistency_for_user(username);

				    service::client_state client_state{service::client_state::internal_tag()};

				    service::storage_proxy::coordinator_query_result qr = co_await proxy.query(schema, std::move(command), std::move(partition_ranges), cl,

				            service::storage_proxy::coordinator_query_options(executor::default_timeout(), empty_service_permit(), client_state));

				    cql3::selection::result_set_builder builder(*selection, gc_clock::now(), cql_serialization_format::latest());

				    cql3::selection::result_set_builder builder(*selection, gc_clock::now());

				    query::result_view::consume(*qr.query_result, partition_slice, cql3::selection::result_set_builder::visitor(builder, *schema, *selection));

				    auto result_set = builder.build();

									
										6

alternator/auth.hh
									
												View File
												
				@@ -20,14 +20,8 @@ class storage_proxy;

				namespace alternator {

				using hmac_sha256_digest = std::array<char, 32>;

				using key_cache = utils::loading_cache<std::string, std::string, 1>;

				std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,

				        std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,

				        const std::vector<temporary_buffer<char>>& body_content, std::string_view region, std::string_view service, std::string_view query_string);

				future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::string username);

				}

									
										41

alternator/conditions.cc
									
												View File
												
				@@ -232,7 +232,14 @@ bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2,

				    if (it2->name == "S") {

				        return rjson::to_string_view(it1->value).starts_with(rjson::to_string_view(it2->value));

				    } else /* it2->name == "B" */ {

				        return base64_begins_with(rjson::to_string_view(it1->value), rjson::to_string_view(it2->value));

				        try {

				            return base64_begins_with(rjson::to_string_view(it1->value), rjson::to_string_view(it2->value));

				        } catch(std::invalid_argument&) {

				            // determine if any of the malformed values is from query and raise an exception if so

				            unwrap_bytes(it1->value, v1_from_query);

				            unwrap_bytes(it2->value, v2_from_query);

				            return false;

				        }

				    }

				}

				@@ -241,7 +248,7 @@ static bool is_set_of(const rjson::value& type1, const rjson::value& type2) {

				}

				// Check if two JSON-encoded values match with the CONTAINS relation

				bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2) {

				bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2, bool v1_from_query, bool v2_from_query) {

				    if (!v1) {

				        return false;

				    }

				@@ -250,7 +257,12 @@ bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2) {

				    if (kv1.name == "S" && kv2.name == "S") {

				        return rjson::to_string_view(kv1.value).find(rjson::to_string_view(kv2.value)) != std::string_view::npos;

				    } else if (kv1.name == "B" && kv2.name == "B") {

				        return rjson::base64_decode(kv1.value).find(rjson::base64_decode(kv2.value)) != bytes::npos;

				        auto d_kv1 = unwrap_bytes(kv1.value, v1_from_query);

				        auto d_kv2 = unwrap_bytes(kv2.value, v2_from_query);

				        if (!d_kv1 || !d_kv2) {

				            return false;

				        }

				        return d_kv1->find(*d_kv2) != bytes::npos;

				    } else if (is_set_of(kv1.name, kv2.name)) {

				        for (auto i = kv1.value.Begin(); i != kv1.value.End(); ++i) {

				            if (*i == kv2.value) {

				@@ -273,11 +285,11 @@ bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2) {

				}

				// Check if two JSON-encoded values match with the NOT_CONTAINS relation

				static bool check_NOT_CONTAINS(const rjson::value* v1, const rjson::value& v2) {

				static bool check_NOT_CONTAINS(const rjson::value* v1, const rjson::value& v2, bool v1_from_query, bool v2_from_query) {

				    if (!v1) {

				        return false;

				    }

				    return !check_CONTAINS(v1, v2);

				    return !check_CONTAINS(v1, v2, v1_from_query, v2_from_query);

				}

				// Check if a JSON-encoded value equals any element of an array, which must have at least one element.

				@@ -374,7 +386,12 @@ bool check_compare(const rjson::value* v1, const rjson::value& v2, const Compara

				                   std::string_view(kv2.value.GetString(), kv2.value.GetStringLength()));

				    }

				    if (kv1.name == "B") {

				        return cmp(rjson::base64_decode(kv1.value), rjson::base64_decode(kv2.value));

				        auto d_kv1 = unwrap_bytes(kv1.value, v1_from_query);

				        auto d_kv2 = unwrap_bytes(kv2.value, v2_from_query);

				        if(!d_kv1 || !d_kv2) {

				            return false;

				        }

				        return cmp(*d_kv1, *d_kv2);

				    }

				    // cannot reach here, as check_comparable_type() verifies the type is one

				    // of the above options.

				@@ -464,7 +481,13 @@ static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const r

				                             bounds_from_query);

				    }

				    if (kv_v.name == "B") {

				        return check_BETWEEN(rjson::base64_decode(kv_v.value), rjson::base64_decode(kv_lb.value), rjson::base64_decode(kv_ub.value), bounds_from_query);

				        auto d_kv_v = unwrap_bytes(kv_v.value, v_from_query);

				        auto d_kv_lb = unwrap_bytes(kv_lb.value, lb_from_query);

				        auto d_kv_ub = unwrap_bytes(kv_ub.value, ub_from_query);

				        if(!d_kv_v || !d_kv_lb || !d_kv_ub) {

				            return false;

				        }

				        return check_BETWEEN(*d_kv_v, *d_kv_lb, *d_kv_ub, bounds_from_query);

				    }

				    if (v_from_query) {

				        throw api_error::validation(

				@@ -557,7 +580,7 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu

				                            format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "

				                                    "got {} instead", argtype));

				                }

				                return check_CONTAINS(got, arg);

				                return check_CONTAINS(got, arg, false, true);

				            }

				        case comparison_operator_type::NOT_CONTAINS:

				            {

				@@ -571,7 +594,7 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu

				                            format("CONTAINS operator requires a single AttributeValue of type String, Number, or Binary, "

				                                    "got {} instead", argtype));

				                }

				                return check_NOT_CONTAINS(got, arg);

				                return check_NOT_CONTAINS(got, arg, false, true);

				            }

				        }

				        throw std::logic_error(format("Internal error: corrupted operator enum: {}", int(op)));

									
										2

alternator/conditions.hh
									
												View File
												
				@@ -38,7 +38,7 @@ conditional_operator_type get_conditional_operator(const rjson::value& req);

				bool verify_expected(const rjson::value& req, const rjson::value* previous_item);

				bool verify_condition(const rjson::value& condition, bool require_all, const rjson::value* previous_item);

				bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2);

				bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2, bool v1_from_query, bool v2_from_query);

				bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2, bool v1_from_query, bool v2_from_query);

				bool verify_condition_expression(

									
										8

alternator/controller.cc
									
												View File
												
				@@ -14,6 +14,8 @@

				#include "db/config.hh"

				#include "cdc/generation_service.hh"

				#include "service/memory_limiter.hh"

				#include "auth/service.hh"

				#include "service/qos/service_level_controller.hh"

				using namespace seastar;

				@@ -28,6 +30,8 @@ controller::controller(

				        sharded<db::system_distributed_keyspace>& sys_dist_ks,

				        sharded<cdc::generation_service>& cdc_gen_svc,

				        sharded<service::memory_limiter>& memory_limiter,

				        sharded<auth::service>& auth_service,

				        sharded<qos::service_level_controller>& sl_controller,

				        const db::config& config)

				    : _gossiper(gossiper)

				    , _proxy(proxy)

				@@ -35,6 +39,8 @@ controller::controller(

				    , _sys_dist_ks(sys_dist_ks)

				    , _cdc_gen_svc(cdc_gen_svc)

				    , _memory_limiter(memory_limiter)

				    , _auth_service(auth_service)

				    , _sl_controller(sl_controller)

				    , _config(config)

				{

				}

				@@ -77,7 +83,7 @@ future<> controller::start_server() {

				        auto get_cdc_metadata = [] (cdc::generation_service& svc) { return std::ref(svc.get_cdc_metadata()); };

				        _executor.start(std::ref(_gossiper), std::ref(_proxy), std::ref(_mm), std::ref(_sys_dist_ks), sharded_parameter(get_cdc_metadata, std::ref(_cdc_gen_svc)), _ssg.value()).get();

				        _server.start(std::ref(_executor), std::ref(_proxy), std::ref(_gossiper)).get();

				        _server.start(std::ref(_executor), std::ref(_proxy), std::ref(_gossiper), std::ref(_auth_service), std::ref(_sl_controller)).get();

				        // Note: from this point on, if start_server() throws for any reason,

				        // it must first call stop_server() to stop the executor and server

				        // services we just started - or Scylla will cause an assertion

									
										12

alternator/controller.hh
									
												View File
												
				@@ -34,6 +34,14 @@ class gossiper;

				}

				namespace auth {

				class service;

				}

				namespace qos {

				class service_level_controller;

				}

				namespace alternator {

				// This is the official DynamoDB API version.

				@@ -53,6 +61,8 @@ class controller : public protocol_server {

				    sharded<db::system_distributed_keyspace>& _sys_dist_ks;

				    sharded<cdc::generation_service>& _cdc_gen_svc;

				    sharded<service::memory_limiter>& _memory_limiter;

				    sharded<auth::service>& _auth_service;

				    sharded<qos::service_level_controller>& _sl_controller;

				    const db::config& _config;

				    std::vector<socket_address> _listen_addresses;

				@@ -68,6 +78,8 @@ public:

				        sharded<db::system_distributed_keyspace>& sys_dist_ks,

				        sharded<cdc::generation_service>& cdc_gen_svc,

				        sharded<service::memory_limiter>& memory_limiter,

				        sharded<auth::service>& auth_service,

				        sharded<qos::service_level_controller>& sl_controller,

				        const db::config& config);

				    virtual sstring name() const override;

									
										4

alternator/error.hh
									
												View File
												
				@@ -23,7 +23,7 @@ namespace alternator {

				// api_error into a JSON object, and that is returned to the user.

				class api_error final : public std::exception {

				public:

				    using status_type = httpd::reply::status_type;

				    using status_type = http::reply::status_type;

				    status_type _http_code;

				    std::string _type;

				    std::string _msg;

				@@ -77,7 +77,7 @@ public:

				        return api_error("TableNotFoundException", std::move(msg));

				    }

				    static api_error internal(std::string msg) {

				        return api_error("InternalServerError", std::move(msg), reply::status_type::internal_server_error);

				        return api_error("InternalServerError", std::move(msg), http::reply::status_type::internal_server_error);

				    }

				    // Provide the "std::exception" interface, to make it easier to print this

									
										138

alternator/executor.cc
									
												View File
												
				@@ -13,12 +13,12 @@

				#include <seastar/core/sleep.hh>

				#include "alternator/executor.hh"

				#include "log.hh"

				#include "schema_builder.hh"

				#include "schema/schema_builder.hh"

				#include "data_dictionary/keyspace_metadata.hh"

				#include "exceptions/exceptions.hh"

				#include "timestamp.hh"

				#include "types/map.hh"

				#include "schema.hh"

				#include "schema/schema.hh"

				#include "query-request.hh"

				#include "query-result-reader.hh"

				#include "cql3/selection/selection.hh"

				@@ -34,13 +34,14 @@

				#include "expressions.hh"

				#include "conditions.hh"

				#include "cql3/constants.hh"

				#include "cql3/util.hh"

				#include <optional>

				#include "utils/overloaded_functor.hh"

				#include <seastar/json/json_elements.hh>

				#include <boost/algorithm/cxx11/any_of.hpp>

				#include "collection_mutation.hh"

				#include "db/query_context.hh"

				#include "schema.hh"

				#include "schema/schema.hh"

				#include "db/tags/extension.hh"

				#include "db/tags/utils.hh"

				#include "alternator/rmw_operation.hh"

				@@ -50,11 +51,13 @@

				#include <unordered_set>

				#include "service/storage_proxy.hh"

				#include "gms/gossiper.hh"

				#include "schema_registry.hh"

				#include "schema/schema_registry.hh"

				#include "utils/error_injection.hh"

				#include "db/schema_tables.hh"

				#include "utils/rjson.hh"

				using namespace std::chrono_literals;

				logging::logger elogger("alternator-executor");

				namespace alternator {

				@@ -114,8 +117,7 @@ std::string json_string::to_json() const {

				void executor::supplement_table_info(rjson::value& descr, const schema& schema, service::storage_proxy& sp) {

				    rjson::add(descr, "CreationDateTime", rjson::value(std::chrono::duration_cast<std::chrono::seconds>(gc_clock::now().time_since_epoch()).count()));

				    rjson::add(descr, "TableStatus", "ACTIVE");

				    auto schema_id_str = schema.id().to_sstring();

				    rjson::add(descr, "TableId", rjson::from_string(schema_id_str));

				    rjson::add(descr, "TableId", rjson::from_string(schema.id().to_sstring()));

				    executor::supplement_table_stream_info(descr, schema, sp);

				}

				@@ -127,6 +129,20 @@ void executor::supplement_table_info(rjson::value& descr, const schema& schema,

				// See https://github.com/scylladb/scylla/issues/4480

				static constexpr int max_table_name_length = 222;

				static bool valid_table_name_chars(std::string_view name) {

				    for (auto c : name) {

				        if ((c < 'a' || c > 'z') &&

				            (c < 'A' || c > 'Z') &&

				            (c < '0' || c > '9') &&

				            c != '_' &&

				            c != '-' &&

				            c != '.') {

				            return false;

				        }

				    }

				    return true;

				}

				// The DynamoDB developer guide, https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.NamingRules

				// specifies that table names "names must be between 3 and 255 characters long

				// and can contain only the following characters: a-z, A-Z, 0-9, _ (underscore), - (dash), . (dot)

				@@ -136,8 +152,7 @@ static void validate_table_name(const std::string& name) {

				        throw api_error::validation(

				                format("TableName must be at least 3 characters long and at most {} characters long", max_table_name_length));

				    }

				    static const std::regex valid_table_name_chars ("[a-zA-Z0-9_.-]*");

				    if (!std::regex_match(name.c_str(), valid_table_name_chars)) {

				    if (!valid_table_name_chars(name)) {

				        throw api_error::validation(

				                "TableName must satisfy regular expression pattern: [a-zA-Z0-9_.-]+");

				    }

				@@ -153,11 +168,10 @@ static void validate_table_name(const std::string& name) {

				// The view_name() function assumes the table_name has already been validated

				// but validates the legality of index_name and the combination of both.

				static std::string view_name(const std::string& table_name, std::string_view index_name, const std::string& delim = ":") {

				    static const std::regex valid_index_name_chars ("[a-zA-Z0-9_.-]*");

				    if (index_name.length() < 3) {

				        throw api_error::validation("IndexName must be at least 3 characters long");

				    }

				    if (!std::regex_match(index_name.data(), valid_index_name_chars)) {

				    if (!valid_table_name_chars(index_name)) {

				        throw api_error::validation(

				                format("IndexName '{}' must satisfy regular expression pattern: [a-zA-Z0-9_.-]+", index_name));

				    }

				@@ -438,6 +452,11 @@ future<executor::request_return_type> executor::describe_table(client_state& cli

				    rjson::add(table_description, "BillingModeSummary", rjson::empty_object());

				    rjson::add(table_description["BillingModeSummary"], "BillingMode", "PAY_PER_REQUEST");

				    rjson::add(table_description["BillingModeSummary"], "LastUpdateToPayPerRequestDateTime", rjson::value(creation_date_seconds));

				    // In PAY_PER_REQUEST billing mode, provisioned capacity should return 0

				    rjson::add(table_description, "ProvisionedThroughput", rjson::empty_object());

				    rjson::add(table_description["ProvisionedThroughput"], "ReadCapacityUnits", 0);

				    rjson::add(table_description["ProvisionedThroughput"], "WriteCapacityUnits", 0);

				    rjson::add(table_description["ProvisionedThroughput"], "NumberOfDecreasesToday", 0);

				    std::unordered_map<std::string,std::string> key_attribute_types;

				    // Add base table's KeySchema and collect types for AttributeDefinitions:

				@@ -460,6 +479,11 @@ future<executor::request_return_type> executor::describe_table(client_state& cli

				            rjson::add(view_entry, "IndexArn", generate_arn_for_index(*schema, index_name));

				            // Add indexes's KeySchema and collect types for AttributeDefinitions:

				            describe_key_schema(view_entry, *vptr, key_attribute_types);

				            // Add projection type

				            rjson::value projection = rjson::empty_object();

				            rjson::add(projection, "ProjectionType", "ALL");

				            // FIXME: we have to get ProjectionType from the schema when it is added

				            rjson::add(view_entry, "Projection", std::move(projection));

				            // Local secondary indexes are marked by an extra '!' sign occurring before the ':' delimiter

				            rjson::value& index_array = (delim_it > 1 && cf_name[delim_it-1] == '!') ? lsi_array : gsi_array;

				            rjson::push_back(index_array, std::move(view_entry));

				@@ -750,7 +774,6 @@ future<executor::request_return_type> executor::tag_resource(client_state& clien

				        co_return api_error::access_denied("Incorrect resource identifier");

				    }

				    schema_ptr schema = get_table_from_arn(_proxy, rjson::to_string_view(*arn));

				    std::map<sstring, sstring> tags_map = get_tags_of_table_or_throw(schema);

				    const rjson::value* tags = rjson::find(request, "Tags");

				    if (!tags || !tags->IsArray()) {

				        co_return api_error::validation("Cannot parse tags");

				@@ -758,8 +781,9 @@ future<executor::request_return_type> executor::tag_resource(client_state& clien

				    if (tags->Size() < 1) {

				        co_return api_error::validation("The number of tags must be at least 1") ;

				    }

				    update_tags_map(*tags, tags_map,  update_tags_action::add_tags);

				    co_await db::update_tags(_mm, schema, std::move(tags_map));

				    co_await db::modify_tags(_mm, schema->ks_name(), schema->cf_name(), [tags](std::map<sstring, sstring>& tags_map) {

				        update_tags_map(*tags, tags_map, update_tags_action::add_tags);

				    });

				    co_return json_string("");

				}

				@@ -777,9 +801,9 @@ future<executor::request_return_type> executor::untag_resource(client_state& cli

				    schema_ptr schema = get_table_from_arn(_proxy, rjson::to_string_view(*arn));

				    std::map<sstring, sstring> tags_map = get_tags_of_table_or_throw(schema);

				    update_tags_map(*tags, tags_map, update_tags_action::delete_tags);

				    co_await db::update_tags(_mm, schema, std::move(tags_map));

				    co_await db::modify_tags(_mm, schema->ks_name(), schema->cf_name(), [tags](std::map<sstring, sstring>& tags_map) {

				        update_tags_map(*tags, tags_map, update_tags_action::delete_tags);

				    });

				    co_return json_string("");

				}

				@@ -917,9 +941,10 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra

				            if  (!range_key.empty() && range_key != view_hash_key && range_key != view_range_key) {

				                add_column(view_builder, range_key, attribute_definitions, column_kind::clustering_key);

				            }

				            sstring where_clause = "\"" + view_hash_key + "\" IS NOT NULL";

				            sstring where_clause = format("{} IS NOT NULL", cql3::util::maybe_quote(view_hash_key));

				            if (!view_range_key.empty()) {

				                where_clause = where_clause + " AND \"" + view_hash_key + "\" IS NOT NULL";

				                where_clause = format("{} AND {} IS NOT NULL", where_clause,

				                    cql3::util::maybe_quote(view_range_key));

				            }

				            where_clauses.push_back(std::move(where_clause));

				            view_builders.emplace_back(std::move(view_builder));

				@@ -974,9 +999,10 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra

				            // Note above we don't need to add virtual columns, as all

				            // base columns were copied to view. TODO: reconsider the need

				            // for virtual columns when we support Projection.

				            sstring where_clause = "\"" + view_hash_key + "\" IS NOT NULL";

				            sstring where_clause = format("{} IS NOT NULL", cql3::util::maybe_quote(view_hash_key));

				            if (!view_range_key.empty()) {

				                where_clause = where_clause + " AND \"" + view_range_key + "\" IS NOT NULL";

				                where_clause = format("{} AND {} IS NOT NULL", where_clause,

				                    cql3::util::maybe_quote(view_range_key));

				            }

				            where_clauses.push_back(std::move(where_clause));

				            view_builders.emplace_back(std::move(view_builder));

				@@ -1082,7 +1108,6 @@ future<executor::request_return_type> executor::update_table(client_state& clien

				    elogger.trace("Updating table {}", request);

				    static const std::vector<sstring> unsupported = {

				        "AttributeDefinitions", 

				        "GlobalSecondaryIndexUpdates", 

				        "ProvisionedThroughput",

				        "ReplicaUpdates",

				@@ -1255,6 +1280,22 @@ put_or_delete_item::put_or_delete_item(const rjson::value& key, schema_ptr schem

				    check_key(key, schema);

				}

				// find_attribute() checks whether the named attribute is stored in the

				// schema as a real column (we do this for key attribute, and for a GSI key)

				// and if so, returns that column. If not, the function returns nullptr,

				// telling the caller that the attribute is stored serialized in the

				// ATTRS_COLUMN_NAME map - not in a stand-alone column in the schema.

				static inline const column_definition* find_attribute(const schema& schema, const bytes& attribute_name) {

				    const column_definition* cdef = schema.get_column_definition(attribute_name);

				    // Although ATTRS_COLUMN_NAME exists as an actual column, when used as an

				    // attribute name it should refer to an attribute inside ATTRS_COLUMN_NAME

				    // not to ATTRS_COLUMN_NAME itself. This if() is needed for #5009.

				    if (cdef && cdef->name() == executor::ATTRS_COLUMN_NAME) {

				        return nullptr;

				    }

				    return cdef;

				}

				put_or_delete_item::put_or_delete_item(const rjson::value& item, schema_ptr schema, put_item)

				        : _pk(pk_from_json(item, schema)), _ck(ck_from_json(item, schema)) {

				    _cells = std::vector<cell>();

				@@ -1262,7 +1303,7 @@ put_or_delete_item::put_or_delete_item(const rjson::value& item, schema_ptr sche

				    for (auto it = item.MemberBegin(); it != item.MemberEnd(); ++it) {

				        bytes column_name = to_bytes(it->name.GetString());

				        validate_value(it->value, "PutItem");

				        const column_definition* cdef = schema->get_column_definition(column_name);

				        const column_definition* cdef = find_attribute(*schema, column_name);

				        if (!cdef) {

				            bytes value = serialize_item(it->value);

				            _cells->push_back({std::move(column_name), serialize_item(it->value)});

				@@ -1294,7 +1335,7 @@ mutation put_or_delete_item::build(schema_ptr schema, api::timestamp_type ts) co

				    auto& row = m.partition().clustered_row(*schema, _ck);

				    attribute_collector attrs_collector;

				    for (auto& c : *_cells) {

				        const column_definition* cdef = schema->get_column_definition(c.column_name);

				        const column_definition* cdef = find_attribute(*schema, c.column_name);

				        if (!cdef) {

				            attrs_collector.put(c.column_name, c.value, ts);

				        } else {

				@@ -1359,7 +1400,8 @@ static lw_shared_ptr<query::read_command> previous_item_read_command(service::st

				    auto regular_columns = boost::copy_range<query::column_id_vector>(

				            schema->regular_columns() | boost::adaptors::transformed([] (const column_definition& cdef) { return cdef.id; }));

				    auto partition_slice = query::partition_slice(std::move(bounds), {}, std::move(regular_columns), selection->get_query_options());

				    return ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, proxy.get_max_result_size(partition_slice));

				    return ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, proxy.get_max_result_size(partition_slice),

				            query::tombstone_limit(proxy.get_tombstone_limit()));

				}

				static dht::partition_range_vector to_partition_ranges(const schema& schema, const partition_key& pk) {

				@@ -1503,7 +1545,7 @@ future<executor::request_return_type> rmw_operation::execute(service::storage_pr

				            // This is the old, unsafe, read before write which does first

				            // a read, then a write. TODO: remove this mode entirely.

				            return get_previous_item(proxy, client_state, schema(), _pk, _ck, permit, stats).then(

				                    [this, &client_state, &proxy, trace_state, permit = std::move(permit)] (std::unique_ptr<rjson::value> previous_item) mutable {

				                    [this, &proxy, trace_state, permit = std::move(permit)] (std::unique_ptr<rjson::value> previous_item) mutable {

				                std::optional<mutation> m = apply(std::move(previous_item), api::new_timestamp());

				                if (!m) {

				                    return make_ready_future<executor::request_return_type>(api_error::conditional_check_failed("Failed condition."));

				@@ -2276,7 +2318,7 @@ void executor::describe_single_item(const cql3::selection::selection& selection,

				                rjson::add_with_string_name(field, type_to_string((*column_it)->type), json_key_column_value(*cell, **column_it));

				            }

				        } else if (cell) {

				            auto deserialized = attrs_type()->deserialize(*cell, cql_serialization_format::latest());

				            auto deserialized = attrs_type()->deserialize(*cell);

				            auto keys_and_values = value_cast<map_type_impl::native_type>(deserialized);

				            for (auto entry : keys_and_values) {

				                std::string attr_name = value_cast<sstring>(entry.first);

				@@ -2311,7 +2353,7 @@ std::optional<rjson::value> executor::describe_single_item(schema_ptr schema,

				        const std::optional<attrs_to_get>& attrs_to_get) {

				    rjson::value item = rjson::empty_object();

				    cql3::selection::result_set_builder builder(selection, gc_clock::now(), cql_serialization_format::latest());

				    cql3::selection::result_set_builder builder(selection, gc_clock::now());

				    query::result_view::consume(query_result, slice, cql3::selection::result_set_builder::visitor(builder, *schema, selection));

				    auto result_set = builder.build();

				@@ -2334,7 +2376,7 @@ std::vector<rjson::value> executor::describe_multi_item(schema_ptr schema,

				        const cql3::selection::selection& selection,

				        const query::result& query_result,

				        const std::optional<attrs_to_get>& attrs_to_get) {

				    cql3::selection::result_set_builder builder(selection, gc_clock::now(), cql_serialization_format::latest());

				    cql3::selection::result_set_builder builder(selection, gc_clock::now());

				    query::result_view::consume(query_result, slice, cql3::selection::result_set_builder::visitor(builder, *schema, selection));

				    auto result_set = builder.build();

				    std::vector<rjson::value> ret;

				@@ -2748,7 +2790,7 @@ update_item_operation::apply(std::unique_ptr<rjson::value> previous_item, api::t

				                }

				            }

				        }

				        const column_definition* cdef = _schema->get_column_definition(column_name);

				        const column_definition* cdef = find_attribute(*_schema, column_name);

				        if (cdef) {

				            bytes column_value = get_key_from_typed_value(json_value, *cdef);

				            row.cells().apply(*cdef, atomic_cell::make_live(*cdef->type, ts, column_value));

				@@ -2770,7 +2812,7 @@ update_item_operation::apply(std::unique_ptr<rjson::value> previous_item, api::t

				                rjson::add_with_string_name(_return_attributes, cn, rjson::copy(*col));

				            }

				        }

				        const column_definition* cdef = _schema->get_column_definition(column_name);

				        const column_definition* cdef = find_attribute(*_schema, column_name);

				        if (cdef) {

				            row.cells().apply(*cdef, atomic_cell::make_dead(ts, gc_clock::now()));

				        } else {

				@@ -3063,7 +3105,8 @@ future<executor::request_return_type> executor::get_item(client_state& client_st

				    auto selection = cql3::selection::selection::wildcard(schema);

				    auto partition_slice = query::partition_slice(std::move(bounds), {}, std::move(regular_columns), selection->get_query_options());

				    auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, _proxy.get_max_result_size(partition_slice));

				    auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, _proxy.get_max_result_size(partition_slice),

				            query::tombstone_limit(_proxy.get_tombstone_limit()));

				    std::unordered_set<std::string> used_attribute_names;

				    auto attrs_to_get = calculate_attrs_to_get(request, used_attribute_names);

				@@ -3077,20 +3120,10 @@ future<executor::request_return_type> executor::get_item(client_state& client_st

				    });

				}

				// is_big() checks approximately if the given JSON value is "bigger" than

				// the given big_size number of bytes. The goal is to *quickly* detect

				// oversized JSON that, for example, is too large to be serialized to a

				// contiguous string - we don't need an accurate size for that. Moreover,

				// as soon as we detect that the JSON is indeed "big", we can return true

				// and don't need to continue calculating its exact size.

				// For simplicity, we use a recursive implementation. This is fine because

				// Alternator limits the depth of JSONs it reads from inputs, and doesn't

				// add more than a couple of levels in its own output construction.

				static void check_big_object(const rjson::value& val, int& size_left);

				static void check_big_array(const rjson::value& val, int& size_left);

				static bool is_big(const rjson::value& val, int big_size = 100'000) {

				bool is_big(const rjson::value& val, int big_size) {

				    if (val.IsString()) {

				        return ssize_t(val.GetStringLength()) > big_size;

				    } else if (val.IsObject()) {

				@@ -3217,7 +3250,8 @@ future<executor::request_return_type> executor::batch_get_item(client_state& cli

				                    rs.schema->regular_columns() | boost::adaptors::transformed([] (const column_definition& cdef) { return cdef.id; }));

				            auto selection = cql3::selection::selection::wildcard(rs.schema);

				            auto partition_slice = query::partition_slice(std::move(bounds), {}, std::move(regular_columns), selection->get_query_options());

				            auto command = ::make_lw_shared<query::read_command>(rs.schema->id(), rs.schema->version(), partition_slice, _proxy.get_max_result_size(partition_slice));

				            auto command = ::make_lw_shared<query::read_command>(rs.schema->id(), rs.schema->version(), partition_slice, _proxy.get_max_result_size(partition_slice),

				                    query::tombstone_limit(_proxy.get_tombstone_limit()));

				            command->allow_limit = db::allow_per_partition_rate_limit::yes;

				            future<std::vector<rjson::value>> f = _proxy.query(rs.schema, std::move(command), std::move(partition_ranges), rs.cl,

				                    service::storage_proxy::coordinator_query_options(executor::default_timeout(), permit, client_state, trace_state)).then(

				@@ -3480,7 +3514,7 @@ public:

				                    rjson::add_with_string_name(field, type_to_string((*_column_it)->type), json_key_column_value(bv, **_column_it));

				                }

				            } else {

				                auto deserialized = attrs_type()->deserialize(bv, cql_serialization_format::latest());

				                auto deserialized = attrs_type()->deserialize(bv);

				                auto keys_and_values = value_cast<map_type_impl::native_type>(deserialized);

				                for (auto entry : keys_and_values) {

				                    std::string attr_name = value_cast<sstring>(entry.first);

				@@ -3537,7 +3571,7 @@ public:

				    }

				};

				static std::tuple<rjson::value, size_t> describe_items(schema_ptr schema, const query::partition_slice& slice, const cql3::selection::selection& selection, std::unique_ptr<cql3::result_set> result_set, std::optional<attrs_to_get>&& attrs_to_get, filter&& filter) {

				static std::tuple<rjson::value, size_t> describe_items(const cql3::selection::selection& selection, std::unique_ptr<cql3::result_set> result_set, std::optional<attrs_to_get>&& attrs_to_get, filter&& filter) {

				    describe_items_visitor visitor(selection.get_columns(), attrs_to_get, filter);

				    result_set->visit(visitor);

				    auto scanned_count = visitor.get_scanned_count();

				@@ -3587,7 +3621,7 @@ static rjson::value encode_paging_state(const schema& schema, const service::pag

				    // We conditionally include these fields when reading CQL tables through alternator.

				    if (!is_alternator_keyspace(schema.ks_name()) && (!pos.has_key() || pos.get_bound_weight() != bound_weight::equal)) {

				        rjson::add_with_string_name(last_evaluated_key, scylla_paging_region, rjson::empty_object());

				        rjson::add(last_evaluated_key[scylla_paging_region.data()], "S", rjson::from_string(to_string(pos.region())));

				        rjson::add(last_evaluated_key[scylla_paging_region.data()], "S", rjson::from_string(fmt::to_string(pos.region())));

				        rjson::add_with_string_name(last_evaluated_key, scylla_paging_weight, rjson::empty_object());

				        rjson::add(last_evaluated_key[scylla_paging_weight.data()], "N", static_cast<int>(pos.get_bound_weight()));

				    }

				@@ -3614,11 +3648,11 @@ static future<executor::request_return_type> do_query(service::storage_proxy& pr

				    if (exclusive_start_key) {

				        partition_key pk = pk_from_json(*exclusive_start_key, schema);

				        auto pos = position_in_partition(position_in_partition::partition_start_tag_t());

				        auto pos = position_in_partition::for_partition_start();

				        if (schema->clustering_key_size() > 0) {

				            pos = pos_from_json(*exclusive_start_key, schema);

				        }

				        paging_state = make_lw_shared<service::pager::paging_state>(pk, pos, query::max_partitions, utils::UUID(), service::pager::paging_state::replicas_per_token_range{}, std::nullopt, 0);

				        paging_state = make_lw_shared<service::pager::paging_state>(pk, pos, query::max_partitions, query_id::create_null_id(), service::pager::paging_state::replicas_per_token_range{}, std::nullopt, 0);

				    }

				    auto regular_columns = boost::copy_range<query::column_id_vector>(

				@@ -3629,7 +3663,8 @@ static future<executor::request_return_type> do_query(service::storage_proxy& pr

				    query::partition_slice::option_set opts = selection->get_query_options();

				    opts.add(custom_opts);

				    auto partition_slice = query::partition_slice(std::move(ck_bounds), std::move(static_columns), std::move(regular_columns), opts);

				    auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, proxy.get_max_result_size(partition_slice));

				    auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, proxy.get_max_result_size(partition_slice),

				        query::tombstone_limit(proxy.get_tombstone_limit()));

				    auto query_state_ptr = std::make_unique<service::query_state>(client_state, trace_state, std::move(permit));

				@@ -3650,7 +3685,7 @@ static future<executor::request_return_type> do_query(service::storage_proxy& pr

				        }

				        auto paging_state = rs->get_metadata().paging_state();

				        bool has_filter = filter;

				        auto [items, size] = describe_items(schema, partition_slice, *selection, std::move(rs), std::move(attrs_to_get), std::move(filter));

				        auto [items, size] = describe_items(*selection, std::move(rs), std::move(attrs_to_get), std::move(filter));

				        if (paging_state) {

				            rjson::add(items, "LastEvaluatedKey", encode_paging_state(*schema, *paging_state));

				        }

				@@ -3659,8 +3694,7 @@ static future<executor::request_return_type> do_query(service::storage_proxy& pr

				            // update our "filtered_row_matched_total" for all the rows matched, despited the filter

				            cql_stats.filtered_rows_matched_total += size;

				        }

				        // TODO: better threshold

				        if (size > 10) {

				        if (is_big(items)) {

				            return make_ready_future<executor::request_return_type>(make_streamed(std::move(items)));

				        }

				        return make_ready_future<executor::request_return_type>(make_jsonable(std::move(items)));

									
										11

alternator/executor.hh
									
												View File
												
				@@ -239,4 +239,15 @@ public:

				    static void supplement_table_stream_info(rjson::value& descr, const schema& schema, service::storage_proxy& sp);

				};

				// is_big() checks approximately if the given JSON value is "bigger" than

				// the given big_size number of bytes. The goal is to *quickly* detect

				// oversized JSON that, for example, is too large to be serialized to a

				// contiguous string - we don't need an accurate size for that. Moreover,

				// as soon as we detect that the JSON is indeed "big", we can return true

				// and don't need to continue calculating its exact size.

				// For simplicity, we use a recursive implementation. This is fine because

				// Alternator limits the depth of JSONs it reads from inputs, and doesn't

				// add more than a couple of levels in its own output construction.

				bool is_big(const rjson::value& val, int big_size = 100'000);

				}

									
										3

alternator/expressions.cc
									
												View File
												
				@@ -634,7 +634,8 @@ std::unordered_map<std::string_view, function_handler_type*> function_handlers {

				            }

				            rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);

				            rjson::value v2 = calculate_value(f._parameters[1], caller, previous_item);

				            return to_bool_json(check_CONTAINS(v1.IsNull() ? nullptr : &v1,  v2));

				            return to_bool_json(check_CONTAINS(v1.IsNull() ? nullptr : &v1,  v2,

				                                    f._parameters[0].is_constant(), f._parameters[1].is_constant()));

				        }

				    },

				};

									
										2

alternator/expressions_types.hh
									
												View File
												
				@@ -19,7 +19,7 @@

				/*

				 * Parsed representation of expressions and their components.

				 *

				 * Types in alternator::parse namespace are used for holding the parse

				 * Types in alternator::parsed namespace are used for holding the parse

				 * tree - objects generated by the Antlr rules after parsing an expression.

				 * Because of the way Antlr works, all these objects are default-constructed

				 * first, and then assigned when the rule is completed, so all these types

									
										32

alternator/serialization.cc
									
												View File
												
				@@ -14,7 +14,7 @@

				#include "rapidjson/writer.h"

				#include "concrete_types.hh"

				#include "cql3/type_json.hh"

				#include "position_in_partition.hh"

				#include "mutation/position_in_partition.hh"

				static logging::logger slogger("alternator-serialization");

				@@ -59,7 +59,9 @@ struct from_json_visitor {

				        bo.write(t.from_string(rjson::to_string_view(v)));

				    }

				    void operator()(const bytes_type_impl& t) const {

				        bo.write(rjson::base64_decode(v));

				        // FIXME: it's difficult at this point to get information if value was provided

				        // in request or comes from the storage, for now we assume it's user's fault.

				        bo.write(*unwrap_bytes(v, true));

				    }

				    void operator()(const boolean_type_impl& t) const {

				        bo.write(boolean_type->decompose(v.GetBool()));

				@@ -73,7 +75,7 @@ struct from_json_visitor {

				    }

				    // default

				    void operator()(const abstract_type& t) const {

				        bo.write(from_json_object(t, v, cql_serialization_format::internal()));

				        bo.write(from_json_object(t, v));

				    }

				};

				@@ -198,7 +200,9 @@ bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column

				                format("The AttributeValue for a key attribute cannot contain an empty string value. Key: {}", column.name_as_text()));

				    }

				    if (column.type == bytes_type) {

				        return rjson::base64_decode(value);

				        // FIXME: it's difficult at this point to get information if value was provided

				        // in request or comes from the storage, for now we assume it's user's fault.

				        return *unwrap_bytes(value, true);

				    } else {

				        return column.type->from_string(value_view);

				    }

				@@ -210,7 +214,7 @@ rjson::value json_key_column_value(bytes_view cell, const column_definition& col

				        std::string b64 = base64_encode(cell);

				        return rjson::from_string(b64);

				    } if (column.type == utf8_type) {

				        return rjson::from_string(std::string(reinterpret_cast<const char*>(cell.data()), cell.size()));

				        return rjson::from_string(reinterpret_cast<const char*>(cell.data()), cell.size());

				    } else if (column.type == decimal_type) {

				        // FIXME: use specialized Alternator number type, not the more

				        // general "decimal_type". A dedicated type can be more efficient

				@@ -261,7 +265,6 @@ position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema)

				    if (bool(region_item) != bool(weight_item)) {

				        throw api_error::validation("Malformed value object: region and weight has to be either both missing or both present");

				    }

				    partition_region region;

				    bound_weight weight;

				    if (region_item) {

				        auto region_view = rjson::to_string_view(get_typed_value(*region_item, "S", scylla_paging_region, "key region"));

				@@ -279,7 +282,7 @@ position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema)

				        return position_in_partition(region, weight, region == partition_region::clustered ? std::optional(std::move(ck)) : std::nullopt);

				    }

				    if (ck.is_empty()) {

				        return position_in_partition(position_in_partition::partition_start_tag_t());

				        return position_in_partition::for_partition_start();

				    }

				    return position_in_partition::for_key(std::move(ck));

				}

				@@ -319,6 +322,17 @@ std::optional<big_decimal> try_unwrap_number(const rjson::value& v) {

				    }

				}

				std::optional<bytes> unwrap_bytes(const rjson::value& value, bool from_query) {

				    try {

				        return rjson::base64_decode(value);

				    } catch (...) {

				        if (from_query) {

				            throw api_error::serialization(format("Invalid base64 data"));

				        }

				        return std::nullopt;

				    }

				}

				const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value& v) {

				    if (!v.IsObject() || v.MemberCount() != 1) {

				        return {"", nullptr};

				@@ -348,7 +362,7 @@ rjson::value number_add(const rjson::value& v1, const rjson::value& v2) {

				    auto n1 = unwrap_number(v1, "UpdateExpression");

				    auto n2 = unwrap_number(v2, "UpdateExpression");

				    rjson::value ret = rjson::empty_object();

				    std::string str_ret = std::string((n1 + n2).to_string());

				    sstring str_ret = (n1 + n2).to_string();

				    rjson::add(ret, "N", rjson::from_string(str_ret));

				    return ret;

				}

				@@ -357,7 +371,7 @@ rjson::value number_subtract(const rjson::value& v1, const rjson::value& v2) {

				    auto n1 = unwrap_number(v1, "UpdateExpression");

				    auto n2 = unwrap_number(v2, "UpdateExpression");

				    rjson::value ret = rjson::empty_object();

				    std::string str_ret = std::string((n1 - n2).to_string());

				    sstring str_ret = (n1 - n2).to_string();

				    rjson::add(ret, "N", rjson::from_string(str_ret));

				    return ret;

				}

									
										9

alternator/serialization.hh
									
												View File
												
				@@ -11,8 +11,8 @@

				#include <string>

				#include <string_view>

				#include <optional>

				#include "types.hh"

				#include "schema_fwd.hh"

				#include "types/types.hh"

				#include "schema/schema_fwd.hh"

				#include "keys.hh"

				#include "utils/rjson.hh"

				#include "utils/big_decimal.hh"

				@@ -62,6 +62,11 @@ big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic);

				// when the given v does not encode a number.

				std::optional<big_decimal> try_unwrap_number(const rjson::value& v);

				// unwrap_bytes decodes byte value, on decoding failure it either raises api_error::serialization

				// iff from_query is true or returns unset optional iff from_query is false.

				// Therefore it's safe to dereference returned optional when called with from_query equal true.

				std::optional<bytes> unwrap_bytes(const rjson::value& value, bool from_query);

				// Check if a given JSON object encodes a set (i.e., it is a {"SS": [...]}, or "NS", "BS"

				// and returns set's type and a pointer to that set. If the object does not encode a set,

				// returned value is {"", nullptr}

									
										31

alternator/server.cc
									
												View File
												
				@@ -16,6 +16,7 @@

				#include <seastar/util/short_streams.hh>

				#include "seastarx.hh"

				#include "error.hh"

				#include "service/qos/service_level_controller.hh"

				#include "utils/rjson.hh"

				#include "auth.hh"

				#include <cctype>

				@@ -23,10 +24,13 @@

				#include "gms/gossiper.hh"

				#include "utils/overloaded_functor.hh"

				#include "utils/fb_utilities.hh"

				#include "utils/aws_sigv4.hh"

				static logging::logger slogger("alternator-server");

				using namespace httpd;

				using request = http::request;

				using reply = http::reply;

				namespace alternator {

				@@ -142,7 +146,7 @@ public:

				            std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {

				        handle_CORS(*req, *rep, false);

				        return _f_handle(std::move(req), std::move(rep)).then(

				                [this](std::unique_ptr<reply> rep) {

				                [](std::unique_ptr<reply> rep) {

				                    rep->set_mime_type("application/x-amz-json-1.0");

				                    rep->done();

				                    return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				@@ -234,7 +238,7 @@ protected:

				future<std::string> server::verify_signature(const request& req, const chunked_content& content) {

				    if (!_enforce_authorization) {

				        slogger.debug("Skipping authorization");

				        return make_ready_future<std::string>("<unauthenticated request>");

				        return make_ready_future<std::string>();

				    }

				    auto host_it = req._headers.find("Host");

				    if (host_it == req._headers.end()) {

				@@ -316,8 +320,13 @@ future<std::string> server::verify_signature(const request& req, const chunked_c

				                                                    region = std::move(region),

				                                                    service = std::move(service),

				                                                    user_signature = std::move(user_signature)] (key_cache::value_ptr key_ptr) {

				        std::string signature = get_signature(user, *key_ptr, std::string_view(host), req._method,

				                datestamp, signed_headers_str, signed_headers_map, content, region, service, "");

				        std::string signature;

				        try {

				            signature = utils::aws::get_signature(user, *key_ptr, std::string_view(host), "/", req._method,

				                datestamp, signed_headers_str, signed_headers_map, &content, region, service, "");

				        } catch (const std::exception& e) {

				            throw api_error::invalid_signature(e.what());

				        }

				        if (signature != std::string_view(user_signature)) {

				            _key_cache.remove(user);

				@@ -364,7 +373,9 @@ static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_

				        tracing::add_session_param(trace_state, "alternator_op", op);

				        tracing::add_query(trace_state, truncated_content_view(query, buf));

				        tracing::begin(trace_state, format("Alternator {}", op), client_state.get_client_address());

				        tracing::set_username(trace_state, auth::authenticated_user(username));

				        if (!username.empty()) {

				            tracing::set_username(trace_state, auth::authenticated_user(username));

				        }

				    }

				    return trace_state;

				}

				@@ -407,7 +418,11 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr

				    auto leave = defer([this] () noexcept { _pending_requests.leave(); });

				    //FIXME: Client state can provide more context, e.g. client's endpoint address

				    // We use unique_ptr because client_state cannot be moved or copied

				    executor::client_state client_state{executor::client_state::internal_tag()};

				    executor::client_state client_state = username.empty()

				        ? service::client_state{service::client_state::internal_tag()}

				        : service::client_state{service::client_state::internal_tag(), _auth_service, _sl_controller, username};

				    co_await client_state.maybe_update_per_service_level_params();

				    tracing::trace_state_ptr trace_state = maybe_trace_query(client_state, username, op, content);

				    tracing::trace(trace_state, op);

				    rjson::value json_request = co_await _json_parser.parse(std::move(content));

				@@ -440,12 +455,14 @@ void server::set_routes(routes& r) {

				//FIXME: A way to immediately invalidate the cache should be considered,

				// e.g. when the system table which stores the keys is changed.

				// For now, this propagation may take up to 1 minute.

				server::server(executor& exec, service::storage_proxy& proxy, gms::gossiper& gossiper)

				server::server(executor& exec, service::storage_proxy& proxy, gms::gossiper& gossiper, auth::service& auth_service, qos::service_level_controller& sl_controller)

				        : _http_server("http-alternator")

				        , _https_server("https-alternator")

				        , _executor(exec)

				        , _proxy(proxy)

				        , _gossiper(gossiper)

				        , _auth_service(auth_service)

				        , _sl_controller(sl_controller)

				        , _key_cache(1024, 1min, slogger)

				        , _enforce_authorization(false)

				        , _enabled_servers{}

									
										15

alternator/server.hh
									
												View File
												
				@@ -15,6 +15,7 @@

				#include <seastar/net/tls.hh>

				#include <optional>

				#include "alternator/auth.hh"

				#include "service/qos/service_level_controller.hh"

				#include "utils/small_vector.hh"

				#include "utils/updateable_value.hh"

				#include <seastar/core/units.hh>

				@@ -26,14 +27,16 @@ using chunked_content = rjson::chunked_content;

				class server {

				    static constexpr size_t content_length_limit = 16*MB;

				    using alternator_callback = std::function<future<executor::request_return_type>(executor&, executor::client_state&,

				            tracing::trace_state_ptr, service_permit, rjson::value, std::unique_ptr<request>)>;

				            tracing::trace_state_ptr, service_permit, rjson::value, std::unique_ptr<http::request>)>;

				    using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;

				    http_server _http_server;

				    http_server _https_server;

				    httpd::http_server _http_server;

				    httpd::http_server _https_server;

				    executor& _executor;

				    service::storage_proxy& _proxy;

				    gms::gossiper& _gossiper;

				    auth::service& _auth_service;

				    qos::service_level_controller& _sl_controller;

				    key_cache _key_cache;

				    bool _enforce_authorization;

				@@ -65,7 +68,7 @@ class server {

				    json_parser _json_parser;

				public:

				    server(executor& executor, service::storage_proxy& proxy, gms::gossiper& gossiper);

				    server(executor& executor, service::storage_proxy& proxy, gms::gossiper& gossiper, auth::service& service, qos::service_level_controller& sl_controller);

				    future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,

				            bool enforce_authorization, semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests);

				@@ -73,8 +76,8 @@ public:

				private:

				    void set_routes(seastar::httpd::routes& r);

				    // If verification succeeds, returns the authenticated user's username

				    future<std::string> verify_signature(const seastar::httpd::request&, const chunked_content&);

				    future<executor::request_return_type> handle_api_request(std::unique_ptr<request> req);

				    future<std::string> verify_signature(const seastar::http::request&, const chunked_content&);

				    future<executor::request_return_type> handle_api_request(std::unique_ptr<http::request> req);

				};

				}

									
										77

alternator/streams.cc
									
												View File
												
				@@ -27,13 +27,14 @@

				#include "cql3/result_set.hh"

				#include "cql3/type_json.hh"

				#include "cql3/column_identifier.hh"

				#include "schema_builder.hh"

				#include "schema/schema_builder.hh"

				#include "service/storage_proxy.hh"

				#include "gms/feature.hh"

				#include "gms/feature_service.hh"

				#include "executor.hh"

				#include "rmw_operation.hh"

				#include "data_dictionary/data_dictionary.hh"

				/**

				 * Base template type to implement  rapidjson::internal::TypeHelper<...>:s

				@@ -74,8 +75,8 @@ struct rapidjson::internal::TypeHelper<ValueType, utils::UUID>

				    : public from_string_helper<ValueType, utils::UUID>

				{};

				static db_clock::time_point as_timepoint(const utils::UUID& uuid) {

				    return db_clock::time_point{utils::UUID_gen::unix_timestamp(uuid)};

				static db_clock::time_point as_timepoint(const table_id& tid) {

				    return db_clock::time_point{utils::UUID_gen::unix_timestamp(tid.uuid())};

				}

				/**

				@@ -106,6 +107,9 @@ public:

				    stream_arn(const UUID& uuid)

				        : UUID(uuid)

				    {}

				    stream_arn(const table_id& tid)

				        : UUID(tid.uuid())

				    {}

				    stream_arn(std::string_view v)

				        : UUID(v.substr(1))

				    {

				@@ -137,25 +141,44 @@ namespace alternator {

				future<alternator::executor::request_return_type> alternator::executor::list_streams(client_state& client_state, service_permit permit, rjson::value request) {

				    _stats.api_operations.list_streams++;

				    auto limit = rjson::get_opt<int>(request, "Limit").value_or(std::numeric_limits<int>::max());

				    auto limit = rjson::get_opt<int>(request, "Limit").value_or(100);

				    auto streams_start = rjson::get_opt<stream_arn>(request, "ExclusiveStartStreamArn");

				    auto table = find_table(_proxy, request);

				    auto db = _proxy.data_dictionary();

				    auto cfs = db.get_tables();

				    auto i = cfs.begin();

				    auto e = cfs.end();

				    if (limit < 1) {

				        throw api_error::validation("Limit must be 1 or more");

				    }

				    // TODO: the unordered_map here is not really well suited for partial

				    // querying - we're sorting on local hash order, and creating a table

				    // between queries may or may not miss info. But that should be rare,

				    // and we can probably expect this to be a single call.

				    std::vector<data_dictionary::table> cfs;

				    if (table) {

				        auto log_name = cdc::log_name(table->cf_name());

				        try {

				            cfs.emplace_back(db.find_table(table->ks_name(), log_name));

				        } catch (data_dictionary::no_such_column_family&) {

				            cfs.clear();

				        }

				    } else {

				        cfs = db.get_tables();

				    }

				    // # 12601 (maybe?) - sort the set of tables on ID. This should ensure we never

				    // generate duplicates in a paged listing here. Can obviously miss things if they 

				    // are added between paged calls and end up with a "smaller" UUID/ARN, but that 

				    // is to be expected.

				    if (std::cmp_less(limit, cfs.size()) || streams_start) {

				        std::sort(cfs.begin(), cfs.end(), [](const data_dictionary::table& t1, const data_dictionary::table& t2) {

				            return t1.schema()->id().uuid() < t2.schema()->id().uuid();

				        });

				    }

				    auto i = cfs.begin();

				    auto e = cfs.end();

				    if (streams_start) {

				        i = std::find_if(i, e, [&](data_dictionary::table t) {

				            return t.schema()->id() == streams_start 

				        i = std::find_if(i, e, [&](const data_dictionary::table& t) {

				            return t.schema()->id().uuid() == streams_start

				                && cdc::get_base_table(db.real_database(), *t.schema())

				                && is_alternator_keyspace(t.schema()->ks_name())

				                ;

				@@ -178,14 +201,7 @@ future<alternator::executor::request_return_type> alternator::executor::list_str

				        if (!is_alternator_keyspace(ks_name)) {

				            continue;

				        }

				        if (table && ks_name != table->ks_name()) {

				            continue;

				        }

				        if (cdc::is_log_for_some_table(db.real_database(), ks_name, cf_name)) {

				            if (table && table != cdc::get_base_table(db.real_database(), *s)) {

				                continue;

				            }

				            rjson::value new_entry = rjson::empty_object();

				            last = i->schema()->id();

				@@ -413,6 +429,8 @@ static std::chrono::seconds confidence_interval(data_dictionary::database db) {

				    return std::chrono::seconds(db.get_config().alternator_streams_time_window_s());

				}

				using namespace std::chrono_literals;

				// Dynamo docs says no data shall live longer than 24h.

				static constexpr auto dynamodb_streams_max_window = 24h;

				@@ -430,7 +448,7 @@ future<executor::request_return_type> executor::describe_stream(client_state& cl

				    auto db = _proxy.data_dictionary();

				    try {

				        auto cf = db.find_column_family(stream_arn);

				        auto cf = db.find_column_family(table_id(stream_arn));

				        schema = cf.schema();

				        bs = cdc::get_base_table(db.real_database(), *schema);

				    } catch (...) {        

				@@ -490,7 +508,7 @@ future<executor::request_return_type> executor::describe_stream(client_state& cl

				    // filter out cdc generations older than the table or now() - cdc::ttl (typically dynamodb_streams_max_window - 24h)

				    auto low_ts = std::max(as_timepoint(schema->id()), db_clock::now() - ttl);

				    return _sdks.cdc_get_versioned_streams(low_ts, { normal_token_owners }).then([this, db, shard_start, limit, ret = std::move(ret), stream_desc = std::move(stream_desc)] (std::map<db_clock::time_point, cdc::streams_version> topologies) mutable {

				    return _sdks.cdc_get_versioned_streams(low_ts, { normal_token_owners }).then([db, shard_start, limit, ret = std::move(ret), stream_desc = std::move(stream_desc)] (std::map<db_clock::time_point, cdc::streams_version> topologies) mutable {

				        auto e = topologies.end();

				        auto prev = e;

				@@ -717,7 +735,7 @@ future<executor::request_return_type> executor::get_shard_iterator(client_state&

				    std::optional<shard_id> sid;

				    try {

				        auto cf = db.find_column_family(stream_arn);

				        auto cf = db.find_column_family(table_id(stream_arn));

				        schema = cf.schema();

				        sid = rjson::get<shard_id>(request, "ShardId");

				    } catch (...) {

				@@ -802,14 +820,14 @@ future<executor::request_return_type> executor::get_records(client_state& client

				    auto db = _proxy.data_dictionary();

				    schema_ptr schema, base;

				    try {

				        auto log_table = db.find_column_family(iter.table);

				        auto log_table = db.find_column_family(table_id(iter.table));

				        schema = log_table.schema();

				        base = cdc::get_base_table(db.real_database(), *schema);

				    } catch (...) {        

				    }

				    if (!schema || !base || !is_alternator_keyspace(schema->ks_name())) {

				        throw api_error::resource_not_found(boost::lexical_cast<std::string>(iter.table));

				        throw api_error::resource_not_found(fmt::to_string(iter.table));

				    }

				    tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());

				@@ -876,11 +894,11 @@ future<executor::request_return_type> executor::get_records(client_state& client

				        ++mul;

				    }

				    auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, _proxy.get_max_result_size(partition_slice),

				            query::row_limit(limit * mul));

				            query::tombstone_limit(_proxy.get_tombstone_limit()), query::row_limit(limit * mul));

				    return _proxy.query(schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), std::move(permit), client_state)).then(

				            [this, schema, partition_slice = std::move(partition_slice), selection = std::move(selection), start_time = std::move(start_time), limit, key_names = std::move(key_names), attr_names = std::move(attr_names), type, iter, high_ts] (service::storage_proxy::coordinator_query_result qr) mutable {       

				        cql3::selection::result_set_builder builder(*selection, gc_clock::now(), cql_serialization_format::latest());

				        cql3::selection::result_set_builder builder(*selection, gc_clock::now());

				        query::result_view::consume(*qr.query_result, partition_slice, cql3::selection::result_set_builder::visitor(builder, *schema, *selection));

				        auto result_set = builder.build();

				@@ -1009,7 +1027,7 @@ future<executor::request_return_type> executor::get_records(client_state& client

				        // ugh. figure out if we are and end-of-shard

				        auto normal_token_owners = _proxy.get_token_metadata_ptr()->count_normal_token_owners();

				        return _sdks.cdc_current_generation_timestamp({ normal_token_owners }).then([this, iter, high_ts, start_time, ret = std::move(ret), nrecords](db_clock::time_point ts) mutable {

				        return _sdks.cdc_current_generation_timestamp({ normal_token_owners }).then([this, iter, high_ts, start_time, ret = std::move(ret)](db_clock::time_point ts) mutable {

				            auto& shard = iter.shard;            

				            if (shard.time < ts && ts < high_ts) {

				@@ -1026,8 +1044,7 @@ future<executor::request_return_type> executor::get_records(client_state& client

				                rjson::add(ret, "NextShardIterator", iter);

				            }

				            _stats.api_operations.get_records_latency.add(std::chrono::steady_clock::now() - start_time);

				            // TODO: determine a better threshold...

				            if (nrecords > 10) {

				            if (is_big(ret)) {

				                return make_ready_future<executor::request_return_type>(make_streamed(std::move(ret)));

				            }

				            return make_ready_future<executor::request_return_type>(make_jsonable(std::move(ret)));

									
										115

alternator/ttl.cc
									
												View File
												
				@@ -8,6 +8,7 @@

				#include <chrono>

				#include <cstdint>

				#include <exception>

				#include <optional>

				#include <seastar/core/sstring.hh>

				#include <seastar/core/coroutine.hh>

				@@ -17,6 +18,7 @@

				#include <seastar/coroutine/maybe_yield.hh>

				#include <boost/multiprecision/cpp_int.hpp>

				#include "exceptions/exceptions.hh"

				#include "gms/gossiper.hh"

				#include "gms/inet_address.hh"

				#include "inet_address_vectors.hh"

				@@ -31,8 +33,8 @@

				#include "service/pager/query_pagers.hh"

				#include "gms/feature_service.hh"

				#include "sstables/types.hh"

				#include "mutation.hh"

				#include "types.hh"

				#include "mutation/mutation.hh"

				#include "types/types.hh"

				#include "types/map.hh"

				#include "utils/rjson.hh"

				#include "utils/big_decimal.hh"

				@@ -92,24 +94,25 @@ future<executor::request_return_type> executor::update_time_to_live(client_state

				    }

				    sstring attribute_name(v->GetString(), v->GetStringLength());

				    std::map<sstring, sstring> tags_map = get_tags_of_table_or_throw(schema);

				    if (enabled) {

				        if (tags_map.contains(TTL_TAG_KEY)) {

				            co_return api_error::validation("TTL is already enabled");

				    co_await db::modify_tags(_mm, schema->ks_name(), schema->cf_name(), [&](std::map<sstring, sstring>& tags_map) {

				        if (enabled) {

				            if (tags_map.contains(TTL_TAG_KEY)) {

				                throw api_error::validation("TTL is already enabled");

				            }

				            tags_map[TTL_TAG_KEY] = attribute_name;

				        } else {

				            auto i = tags_map.find(TTL_TAG_KEY);

				            if (i == tags_map.end()) {

				                throw api_error::validation("TTL is already disabled");

				            } else if (i->second != attribute_name) {

				                throw api_error::validation(format(

				                    "Requested to disable TTL on attribute {}, but a different attribute {} is enabled.",

				                    attribute_name, i->second));

				            }

				            tags_map.erase(TTL_TAG_KEY);

				        }

				        tags_map[TTL_TAG_KEY] = attribute_name;

				    } else {

				        auto i = tags_map.find(TTL_TAG_KEY);

				        if (i == tags_map.end()) {

				            co_return api_error::validation("TTL is already disabled");

				        } else if (i->second != attribute_name) {

				            co_return api_error::validation(format(

				                "Requested to disable TTL on attribute {}, but a different attribute {} is enabled.",

				                attribute_name, i->second));

				        }

				        tags_map.erase(TTL_TAG_KEY);

				    }

				    co_await db::update_tags(_mm, schema, std::move(tags_map));

				    });

				    // Prepare the response, which contains a TimeToLiveSpecification

				    // basically identical to the request's

				    rjson::value response = rjson::empty_object();

				@@ -136,7 +139,7 @@ future<executor::request_return_type> executor::describe_time_to_live(client_sta

				// expiration_service is a sharded service responsible for cleaning up expired

				// items in all tables with per-item expiration enabled. Currently, this means

				// Alternator tables with TTL configured via a UpdateTimeToLeave request.

				// Alternator tables with TTL configured via a UpdateTimeToLive request.

				//

				// Here is a brief overview of how the expiration service works:

				//

				@@ -150,25 +153,25 @@ future<executor::request_return_type> executor::describe_time_to_live(client_sta

				// To avoid scanning the same items RF times in RF replicas, only one node is

				// responsible for scanning a token range at a time. Normally, this is the

				// node owning this range as a "primary range" (the first node in the ring

				// with this range), but when this node is down, other nodes may take over

				// (FIXME: this is not implemented yet).

				// with this range), but when this node is down, the secondary owner (the

				// second in the ring) may take over.

				// An expiration thread is reponsible for all tables which need expiration

				// scans. FIXME: explain how this is done with multiple tables - parallel,

				// staggered, or what?

				// scans. Currently, the different tables are scanned sequentially (not in

				// parallel).

				// The expiration thread scans item using CL=QUORUM to ensures that it reads

				// a consistent expiration-time attribute. This means that the items are read

				// locally and in addition QUORUM-1 additional nodes (one additional node

				// when RF=3) need to read the data and send digests.

				// FIXME: explain if we can read the exact attribute or the entire map.

				// When the expiration thread decides that an item has expired and wants

				// to delete it, it does it using a CL=QUORUM write. This allows this

				// deletion to be visible for consistent (quorum) reads. The deletion,

				// like user deletions, will also appear on the CDC log and therefore

				// Alternator Streams if enabled (FIXME: explain how we mark the

				// deletion different from user deletes. We don't do it yet.).

				expiration_service::expiration_service(data_dictionary::database db, service::storage_proxy& proxy)

				// Alternator Streams if enabled - currently as ordinary deletes (the

				// userIdentity flag is currently missing this is issue #11523).

				expiration_service::expiration_service(data_dictionary::database db, service::storage_proxy& proxy, gms::gossiper& g)

				        : _db(db)

				        , _proxy(proxy)

				        , _gossiper(g)

				{

				}

				@@ -282,7 +285,9 @@ static future<> expire_item(service::storage_proxy& proxy,

				        auto ck = clustering_key::from_exploded(exploded_ck);

				        m.partition().clustered_row(*schema, ck).apply(tombstone(ts, gc_clock::now()));

				    }

				    return proxy.mutate(std::vector<mutation>{std::move(m)},

				    std::vector<mutation> mutations;

				    mutations.push_back(std::move(m));

				    return proxy.mutate(std::move(mutations),

				        db::consistency_level::LOCAL_QUORUM,

				        executor::default_timeout(), // FIXME - which timeout?

				        qs.get_trace_state(), qs.get_permit(),

				@@ -365,7 +370,7 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary

				// 2. The primary replica for this token is currently marked down.

				// 3. In this node, this shard is responsible for this token.

				// We use the <secondary> case to handle the possibility that some of the

				// nodes in the system are down. A dead node will not be expiring expiring

				// nodes in the system are down. A dead node will not be expiring

				// the tokens owned by it, so we want the secondary owner to take over its

				// primary ranges.

				//

				@@ -511,7 +516,7 @@ struct scan_ranges_context {

				        opts.set<query::partition_slice::option::bypass_cache>();

				        std::vector<query::clustering_range> ck_bounds{query::clustering_range::make_open_ended_both_sides()};

				        auto partition_slice = query::partition_slice(std::move(ck_bounds), {}, std::move(regular_columns), opts);

				        command = ::make_lw_shared<query::read_command>(s->id(), s->version(), partition_slice, proxy.get_max_result_size(partition_slice));

				        command = ::make_lw_shared<query::read_command>(s->id(), s->version(), partition_slice, proxy.get_max_result_size(partition_slice), query::tombstone_limit(proxy.get_tombstone_limit()));

				        executor::client_state client_state{executor::client_state::internal_tag()};

				        tracing::trace_state_ptr trace_state;

				        // NOTICE: empty_service_permit is used because the TTL service has fixed parallelism

				@@ -546,13 +551,34 @@ static future<> scan_table_ranges(

				            co_return;

				        }

				        auto units = co_await get_units(page_sem, 1);

				        // We don't to limit page size in number of rows because there is a

				        // builtin limit of the page's size in bytes. Setting this limit to 1

				        // is useful for debugging the paging code with moderate-size data.

				        // We don't need to limit page size in number of rows because there is

				        // a builtin limit of the page's size in bytes. Setting this limit to

				        // 1 is useful for debugging the paging code with moderate-size data.

				        uint32_t limit = std::numeric_limits<uint32_t>::max();

				        // FIXME: which timeout?

				        // FIXME: if read times out, need to retry it.

				        std::unique_ptr<cql3::result_set> rs = co_await p->fetch_page(limit, gc_clock::now(), executor::default_timeout());

				        // Read a page, and if that times out, try again after a small sleep.

				        // If we didn't catch the timeout exception, it would cause the scan

				        // be aborted and only be restarted at the next scanning period.

				        // If we retry too many times, give up and restart the scan later.

				        std::unique_ptr<cql3::result_set> rs;

				        for (int retries=0; ; retries++) {

				            try {

				                // FIXME: which timeout?

				                rs = co_await p->fetch_page(limit, gc_clock::now(), executor::default_timeout());

				                break;

				            } catch(exceptions::read_timeout_exception&) {

				                tlogger.warn("expiration scanner read timed out, will retry: {}",

				                    std::current_exception());

				            }

				            // If we didn't break out of this loop, add a minimal sleep

				            if (retries >= 10) {

				                // Don't get stuck forever asking the same page, maybe there's

				                // a bug or a real problem in several replicas. Give up on

				                // this scan an retry the scan from a random position later,

				                // in the next scan period.

				                throw runtime_exception("scanner thread failed after too many timeouts for the same page");

				            }

				            co_await sleep_abortable(std::chrono::seconds(1), abort_source);

				        }

				        auto rows = rs->rows();

				        auto meta = rs->get_metadata().get_names();

				        std::optional<unsigned> expiration_column;

				@@ -637,6 +663,7 @@ static future<> scan_table_ranges(

				static future<bool> scan_table(

				    service::storage_proxy& proxy,

				    data_dictionary::database db,

				    gms::gossiper& gossiper,

				    schema_ptr s,

				    abort_source& abort_source,

				    named_semaphore& page_sem,

				@@ -689,7 +716,7 @@ static future<bool> scan_table(

				    expiration_stats.scan_table++;

				    // FIXME: need to pace the scan, not do it all at once.

				    scan_ranges_context scan_ctx{s, proxy, std::move(column_name), std::move(member)};

				    token_ranges_owned_by_this_shard<primary> my_ranges(db.real_database(), proxy.gossiper(), s);

				    token_ranges_owned_by_this_shard<primary> my_ranges(db.real_database(), gossiper, s);

				    while (std::optional<dht::partition_range> range = my_ranges.next_partition_range()) {

				        // Note that because of issue #9167 we need to run a separate

				        // query on each partition range, and can't pass several of

				@@ -709,7 +736,7 @@ static future<bool> scan_table(

				    // by tasking another node to take over scanning of the dead node's primary

				    // ranges. What we do here is that this node will also check expiration

				    // on its *secondary* ranges - but only those whose primary owner is down.

				    token_ranges_owned_by_this_shard<secondary> my_secondary_ranges(db.real_database(), proxy.gossiper(), s);

				    token_ranges_owned_by_this_shard<secondary> my_secondary_ranges(db.real_database(), gossiper, s);

				    while (std::optional<dht::partition_range> range = my_secondary_ranges.next_partition_range()) {

				        expiration_stats.secondary_ranges_scanned++;

				        dht::partition_range_vector partition_ranges;

				@@ -741,7 +768,7 @@ future<> expiration_service::run() {

				                co_return;

				            }

				            try {

				                co_await scan_table(_proxy, _db, s, _abort_source, _page_sem, _expiration_stats);

				                co_await scan_table(_proxy, _db, _gossiper, s, _abort_source, _page_sem, _expiration_stats);

				            } catch (...) {

				                // The scan of a table may fail in the middle for many

				                // reasons, including network failure and even the table

				@@ -767,13 +794,15 @@ future<> expiration_service::run() {

				        // in the next iteration by reducing the scanner's scheduling-group

				        // share (if using a separate scheduling group), or introduce

				        // finer-grain sleeps into the scanning code.

				        std::chrono::seconds scan_duration(std::chrono::duration_cast<std::chrono::seconds>(lowres_clock::now() - start));

				        std::chrono::seconds period(_db.get_config().alternator_ttl_period_in_seconds());

				        std::chrono::milliseconds scan_duration(std::chrono::duration_cast<std::chrono::milliseconds>(lowres_clock::now() - start));

				        std::chrono::milliseconds period(long(_db.get_config().alternator_ttl_period_in_seconds() * 1000));

				        if (scan_duration < period) {

				            try {

				                tlogger.info("sleeping {} seconds until next period", (period - scan_duration).count());

				                tlogger.info("sleeping {} seconds until next period", (period - scan_duration).count()/1000.0);

				                co_await seastar::sleep_abortable(period - scan_duration, _abort_source);

				            } catch(seastar::sleep_aborted&) {}

				        } else {

				                tlogger.warn("scan took {} seconds, longer than period - not sleeping", scan_duration.count()/1000.0);

				        }

				    }

				}

									
										7

alternator/ttl.hh
									
												View File
												
				@@ -14,6 +14,10 @@

				#include <seastar/core/semaphore.hh>

				#include "data_dictionary/data_dictionary.hh"

				namespace gms {

				class gossiper;

				}

				namespace replica {

				class database;

				}

				@@ -47,6 +51,7 @@ public:

				private:

				    data_dictionary::database _db;

				    service::storage_proxy& _proxy;

				    gms::gossiper& _gossiper;

				    // _end is set by start(), and resolves when the the background service

				    // started by it ends. To ask the background service to end, _abort_source

				    // should be triggered. stop() below uses both _abort_source and _end.

				@@ -60,7 +65,7 @@ public:

				    // sharded_service<expiration_service>::start() creates this object on

				    // all shards, so calls this constructor on each shard. Later, the

				    // additional start() function should be invoked on all shards.

				    expiration_service(data_dictionary::database, service::storage_proxy&);

				    expiration_service(data_dictionary::database, service::storage_proxy&, gms::gossiper&);

				    future<> start();

				    future<> run();

				    // sharded_service<expiration_service>::stop() calls the following stop()

									
										15

amplify.yml
									
										Normal file
									
												View File
												
				@@ -0,0 +1,15 @@

				version: 1

				applications:

				  - frontend:

				      phases:

				        build:

				          commands:

				            - make setupenv

				            - make dirhtml

				      artifacts:

				        baseDirectory: _build/dirhtml

				        files:

				          - '**/*'

				      cache:

				        paths: []

				    appRoot: docs

									
										70

api/CMakeLists.txt
									
										Normal file
									
												View File
												
				@@ -0,0 +1,70 @@

				# Generate C++ sources from Swagger definitions

				set(swagger_files

				  api-doc/authorization_cache.json

				  api-doc/cache_service.json

				  api-doc/collectd.json

				  api-doc/column_family.json

				  api-doc/commitlog.json

				  api-doc/compaction_manager.json

				  api-doc/config.json

				  api-doc/endpoint_snitch_info.json

				  api-doc/error_injection.json

				  api-doc/failure_detector.json

				  api-doc/gossiper.json

				  api-doc/hinted_handoff.json

				  api-doc/lsa.json

				  api-doc/messaging_service.json

				  api-doc/storage_proxy.json

				  api-doc/storage_service.json

				  api-doc/stream_manager.json

				  api-doc/system.json

				  api-doc/task_manager.json

				  api-doc/task_manager_test.json

				  api-doc/utils.json)

				foreach(f ${swagger_files})

				  get_filename_component(fname "${f}" NAME_WE)

				  get_filename_component(dir "${f}" DIRECTORY)

				  seastar_generate_swagger(

				    TARGET scylla_swagger_gen_${fname}

				    VAR scylla_swagger_gen_${fname}_files

				    IN_FILE "${CMAKE_CURRENT_SOURCE_DIR}/${f}"

				    OUT_DIR "${scylla_gen_build_dir}/api/${dir}")

				  list(APPEND swagger_gen_files "${scylla_swagger_gen_${fname}_files}")

				endforeach()

				add_library(api)

				target_sources(api

				  PRIVATE

				    api.cc

				    cache_service.cc

				    collectd.cc

				    column_family.cc

				    commitlog.cc

				    compaction_manager.cc

				    config.cc

				    endpoint_snitch.cc

				    error_injection.cc

				    authorization_cache.cc

				    failure_detector.cc

				    gossiper.cc

				    hinted_handoff.cc

				    lsa.cc

				    messaging_service.cc

				    storage_proxy.cc

				    storage_service.cc

				    stream_manager.cc

				    system.cc

				    task_manager.cc

				    task_manager_test.cc

				    ${swagger_gen_files})

				target_include_directories(api

				  PUBLIC

				    ${CMAKE_SOURCE_DIR}

				    ${scylla_gen_build_dir})

				target_link_libraries(api

				  idl

				  wasmtime_bindings

				  Seastar::seastar

				  xxHash::xxhash)

									
										4

api/api-doc/storage_service.json
									
												View File
												
				@@ -1228,7 +1228,7 @@

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Removes token (and all data associated with enpoint that had it) from the ring",

				               "summary":"Removes a node from the cluster. Replicated data that logically belonged to this node is redistributed among the remaining nodes.",

				               "type":"void",

				               "nickname":"remove_node",

				               "produces":[

				@@ -1245,7 +1245,7 @@

				                  },

				                  {

				                     "name":"ignore_nodes",

				                     "description":"List of dead nodes to ingore in removenode operation",

				                     "description":"Comma-separated list of dead nodes to ignore in removenode operation. Use the same method for all nodes to ignore: either Host IDs or ip addresses.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

									
										39

api/api-doc/system.json
									
												View File
												
				@@ -52,6 +52,45 @@

				            }

				         ]

				      },

				      {

				         "path":"/system/log",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Write a message to the Scylla log",

				               "type":"void",

				               "nickname":"write_log_message",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"message",

				                     "description":"The message to write to the log",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"level",

				                     "description":"The logging level to use",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "enum":[

				                        "error",

				                        "warn",

				                        "info",

				                        "debug",

				                        "trace"

				                     ],

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/system/drop_sstable_caches",

				         "operations":[

									
										329

api/api-doc/task_manager.json
									
										Normal file
									
												View File
												
				@@ -0,0 +1,329 @@

				{

				    "apiVersion":"0.0.1",

				    "swaggerVersion":"1.2",

				    "basePath":"{{Protocol}}://{{Host}}",

				    "resourcePath":"/task_manager",

				    "produces":[

				       "application/json"

				    ],

				    "apis":[

				       {

				          "path":"/task_manager/list_modules",

				          "operations":[

				             {

				                "method":"GET",

				                "summary":"Get all modules names",

				                "type":"array",

				                "items":{

				                   "type":"string"

				                },

				                "nickname":"get_modules",

				                "produces":[

				                   "application/json"

				                ],

				                "parameters":[

				                ]

				             }

				          ]

				       },

				       {

				          "path":"/task_manager/list_module_tasks/{module}",

				          "operations":[

				             {

				                "method":"GET",

				                "summary":"Get a list of tasks",

				                "type":"array",

				                "items":{

				                    "type":"task_stats"

				                },

				                "nickname":"get_tasks",

				                "produces":[

				                   "application/json"

				                ],

				                "parameters":[

				                    {

				                        "name":"module",

				                        "description":"The module to query about",

				                        "required":true,

				                        "allowMultiple":false,

				                        "type":"string",

				                        "paramType":"path"

				                    },

				                    {

				                        "name":"internal",

				                        "description":"Boolean flag indicating whether internal tasks should be shown (false by default)",

				                        "required":false,

				                        "allowMultiple":false,

				                        "type":"boolean",

				                        "paramType":"query"

				                    },

				                    {

				                        "name":"keyspace",

				                        "description":"The keyspace to query about",

				                        "required":false,

				                        "allowMultiple":false,

				                        "type":"string",

				                        "paramType":"query"

				                    },

				                    {

				                        "name":"table",

				                        "description":"The table to query about",

				                        "required":false,

				                        "allowMultiple":false,

				                        "type":"string",

				                        "paramType":"query"

				                    }

				                ]

				             }

				          ]

				       },

				       {

				          "path":"/task_manager/task_status/{task_id}",

				          "operations":[

				             {

				                "method":"GET",

				                "summary":"Get task status",

				                "type":"task_status",

				                "nickname":"get_task_status",

				                "produces":[

				                   "application/json"

				                ],

				                "parameters":[

				                    {

				                        "name":"task_id",

				                        "description":"The uuid of a task to query about",

				                        "required":true,

				                        "allowMultiple":false,

				                        "type":"string",

				                        "paramType":"path"

				                    }

				                ]

				             }

				          ]

				       },

				       {

				          "path":"/task_manager/abort_task/{task_id}",

				          "operations":[

				             {

				                "method":"POST",

				                "summary":"Abort running task and its descendants",

				                "type":"void",

				                "nickname":"abort_task",

				                "produces":[

				                   "application/json"

				                ],

				                "parameters":[

				                   {

				                      "name":"task_id",

				                      "description":"The uuid of a task to abort",

				                      "required":true,

				                      "allowMultiple":false,

				                      "type":"string",

				                      "paramType":"path"

				                   }

				                ]

				             }

				          ]

				       },

				       {

				        "path":"/task_manager/wait_task/{task_id}",

				        "operations":[

				           {

				              "method":"GET",

				              "summary":"Wait for a task to complete",

				              "type":"task_status",

				              "nickname":"wait_task",

				              "produces":[

				                 "application/json"

				              ],

				              "parameters":[

				                 {

				                    "name":"task_id",

				                    "description":"The uuid of a task to wait for",

				                    "required":true,

				                    "allowMultiple":false,

				                    "type":"string",

				                    "paramType":"path"

				                 }

				              ]

				           }

				        ]

				     },

				     {

				      "path":"/task_manager/task_status_recursive/{task_id}",

				      "operations":[

				         {

				            "method":"GET",

				            "summary":"Get statuses of the task and all its descendants",

				            "type":"array",

				            "items":{

				               "type":"task_status"

				            },

				            "nickname":"get_task_status_recursively",

				            "produces":[

				               "application/json"

				            ],

				            "parameters":[

				                {

				                    "name":"task_id",

				                    "description":"The uuid of a task to query about",

				                    "required":true,

				                    "allowMultiple":false,

				                    "type":"string",

				                    "paramType":"path"

				                }

				            ]

				         }

				      ]

				     },

				     {

				         "path":"/task_manager/ttl",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Set ttl in seconds and get last value",

				               "type":"long",

				               "nickname":"get_and_update_ttl",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"ttl",

				                     "description":"The number of seconds for which the tasks will be kept in memory after it finishes",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"long",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				     }

				    ],

				    "models":{

				       "task_stats" :{

				           "id": "task_stats",

				           "description":"A task statistics object",

				           "properties":{

				             "task_id":{

				                "type":"string",

				                "description":"The uuid of a task"

				             },

				             "state":{

				                "type":"string",

				                "enum":[

				                  "created",

				                  "running",

				                  "done",

				                  "failed"

				                ],

				                "description":"The state of a task"

				             },

				             "type":{

				                "type":"string",

				                "description":"The description of the task"

				             },

				             "keyspace":{

				                "type":"string",

				                "description":"The keyspace the task is working on (if applicable)"

				             },

				             "table":{

				                "type":"string",

				                "description":"The table the task is working on (if applicable)"

				             },

				             "entity":{

				                "type":"string",

				                "description":"Task-specific entity description"

				             },

				             "sequence_number":{

				                "type":"long",

				                "description":"The running sequence number of the task"

				             }

				           }

				       },

				       "task_status":{

				          "id":"task_status",

				          "description":"A task status object",

				          "properties":{

				             "id":{

				                "type":"string",

				                "description":"The uuid of the task"

				             },

				             "type":{

				                "type":"string",

				                "description":"The description of the task"

				             },

				             "state":{

				               "type":"string",

				               "enum":[

				                 "created",

				                 "running",

				                 "done",

				                 "failed"

				               ],

				                "description":"The state of the task"

				             },

				             "is_abortable":{

				                "type":"boolean",

				                "description":"Boolean flag indicating whether the task can be aborted"

				             },

				             "start_time":{

				                "type":"datetime",

				                "description":"The start time of the task"

				             },

				             "end_time":{

				                "type":"datetime",

				                "description":"The end time of the task (unspecified when the task is not completed)"

				             },

				             "error":{

				                "type":"string",

				                "description":"Error string, if the task failed"

				             },

				             "parent_id":{

				               "type":"string",

				               "description":"The uuid of the parent task"

				            },

				            "sequence_number":{

				               "type":"long",

				               "description":"The running sequence number of the task"

				            },

				            "shard":{

				               "type":"long",

				               "description":"The number of a shard the task is running on"

				            },

				            "keyspace":{

				               "type":"string",

				               "description":"The keyspace the task is working on (if applicable)"

				            },

				            "table":{

				               "type":"string",

				               "description":"The table the task is working on (if applicable)"

				            },

				            "entity":{

				               "type":"string",

				               "description":"Task-specific entity description"

				            },

				            "progress_units":{

				               "type":"string",

				               "description":"A description of the progress units"

				            },

				            "progress_total":{

				               "type":"double",

				               "description":"The total number of units to complete for the task"

				            },

				            "progress_completed":{

				               "type":"double",

				               "description":"The number of units completed so far"

				            },

				            "children_ids":{

				               "type":"array",

				                "items":{

				                    "type":"string"

				                },

				               "description":"Task IDs of children of this task"

				            }

				          }

				       }

				    }

				 }

									
										153

api/api-doc/task_manager_test.json
									
										Normal file
									
												View File
												
				@@ -0,0 +1,153 @@

				{

				    "apiVersion":"0.0.1",

				    "swaggerVersion":"1.2",

				    "basePath":"{{Protocol}}://{{Host}}",

				    "resourcePath":"/task_manager_test",

				    "produces":[

				       "application/json"

				    ],

				    "apis":[

				       {

				          "path":"/task_manager_test/test_module",

				          "operations":[

				             {

				                "method":"POST",

				                "summary":"Register test module in task manager",

				                "type":"void",

				                "nickname":"register_test_module",

				                "produces":[

				                   "application/json"

				                ],

				                "parameters":[

				                ]

				             },

				             {

				                "method":"DELETE",

				                "summary":"Unregister test module in task manager",

				                "type":"void",

				                "nickname":"unregister_test_module",

				                "produces":[

				                   "application/json"

				                ],

				                "parameters":[

				                ]

				             }

				          ]

				       },

				       {

				          "path":"/task_manager_test/test_task",

				          "operations":[

				             {

				                "method":"POST",

				                "summary":"Register test task",

				                "type":"string",

				                "nickname":"register_test_task",

				                "produces":[

				                   "application/json"

				                ],

				                "parameters":[

				                    {

				                        "name":"task_id",

				                        "description":"The uuid of a task to register",

				                        "required":false,

				                        "allowMultiple":false,

				                        "type":"string",

				                        "paramType":"query"

				                    },

				                    {

				                        "name":"shard",

				                        "description":"The shard of the task",

				                        "required":false,

				                        "allowMultiple":false,

				                        "type":"long",

				                        "paramType":"query"

				                    },

				                    {

				                        "name":"parent_id",

				                        "description":"The uuid of a parent task",

				                        "required":false,

				                        "allowMultiple":false,

				                        "type":"string",

				                        "paramType":"query"

				                    },

				                    {

				                        "name":"keyspace",

				                        "description":"The keyspace the task is working on",

				                        "required":false,

				                        "allowMultiple":false,

				                        "type":"string",

				                        "paramType":"query"

				                    },

				                    {

				                        "name":"table",

				                        "description":"The table the task is working on",

				                        "required":false,

				                        "allowMultiple":false,

				                        "type":"string",

				                        "paramType":"query"

				                    },

				                    {

				                        "name":"entity",

				                        "description":"Task-specific entity description",

				                        "required":false,

				                        "allowMultiple":false,

				                        "type":"string",

				                        "paramType":"query"

				                    }

				                ]

				             },

				             {

				                "method":"DELETE",

				                "summary":"Unregister test task",

				                "type":"void",

				                "nickname":"unregister_test_task",

				                "produces":[

				                   "application/json"

				                ],

				                "parameters":[

				                    {

				                        "name":"task_id",

				                        "description":"The uuid of a task to register",

				                        "required":true,

				                        "allowMultiple":false,

				                        "type":"string",

				                        "paramType":"query"

				                    }

				                ]

				             }

				          ]

				       },

				       {

				          "path":"/task_manager_test/finish_test_task/{task_id}",

				          "operations":[

				             {

				                "method":"POST",

				                "summary":"Finish test task",

				                "type":"void",

				                "nickname":"finish_test_task",

				                "produces":[

				                   "application/json"

				                ],

				                "parameters":[

				                   {

				                      "name":"task_id",

				                      "description":"The uuid of a task to finish",

				                      "required":true,

				                      "allowMultiple":false,

				                      "type":"string",

				                      "paramType":"path"

				                   },

				                   {

				                      "name":"error",

				                      "description":"The error with which task fails (if it does)",

				                      "required":false,

				                      "allowMultiple":false,

				                      "type":"string",

				                      "paramType":"query"

				                   }

				                ]

				             }

				          ]

				       }

				    ]

				 }

									
										51

api/api.cc
									
												View File
												
				@@ -29,10 +29,13 @@

				#include "stream_manager.hh"

				#include "system.hh"

				#include "api/config.hh"

				#include "task_manager.hh"

				#include "task_manager_test.hh"

				logging::logger apilog("api");

				namespace api {

				using namespace seastar::httpd;

				static std::unique_ptr<reply> exception_reply(std::exception_ptr eptr) {

				    try {

				@@ -146,8 +149,14 @@ future<> unset_server_snapshot(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_snapshot(ctx, r); });

				}

				future<> set_server_snitch(http_context& ctx) {

				    return register_api(ctx, "endpoint_snitch_info", "The endpoint snitch info API", set_endpoint_snitch);

				future<> set_server_snitch(http_context& ctx, sharded<locator::snitch_ptr>& snitch) {

				    return register_api(ctx, "endpoint_snitch_info", "The endpoint snitch info API", [&snitch] (http_context& ctx, routes& r) {

				        set_endpoint_snitch(ctx, r, snitch);

				    });

				}

				future<> unset_server_snitch(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_endpoint_snitch(ctx, r); });

				}

				future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g) {

				@@ -157,9 +166,15 @@ future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g) {

				                });

				}

				future<> set_server_load_sstable(http_context& ctx) {

				future<> set_server_load_sstable(http_context& ctx, sharded<db::system_keyspace>& sys_ks) {

				    return register_api(ctx, "column_family",

				                "The column family API", set_column_family);

				                "The column family API", [&sys_ks] (http_context& ctx, routes& r) {

				                    set_column_family(ctx, r, sys_ks);

				                });

				}

				future<> unset_server_load_sstable(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_column_family(ctx, r); });

				}

				future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms) {

				@@ -179,6 +194,10 @@ future<> set_server_storage_proxy(http_context& ctx, sharded<service::storage_se

				                });

				}

				future<> unset_server_storage_proxy(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_storage_proxy(ctx, r); });

				}

				future<> set_server_stream_manager(http_context& ctx, sharded<streaming::stream_manager>& sm) {

				    return register_api(ctx, "stream_manager",

				                "The stream manager API", [&sm] (http_context& ctx, routes& r) {

				@@ -245,6 +264,30 @@ future<> set_server_done(http_context& ctx) {

				    });

				}

				future<> set_server_task_manager(http_context& ctx, lw_shared_ptr<db::config> cfg) {

				    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);

				    return ctx.http_server.set_routes([rb, &ctx, &cfg = *cfg](routes& r) {

				        rb->register_function(r, "task_manager",

				                "The task manager API");

				        set_task_manager(ctx, r, cfg);

				    });

				}

				#ifndef SCYLLA_BUILD_MODE_RELEASE

				future<> set_server_task_manager_test(http_context& ctx) {

				    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);

				    return ctx.http_server.set_routes([rb, &ctx](routes& r) mutable {

				        rb->register_function(r, "task_manager_test",

				                "The task manager test API");

				        set_task_manager_test(ctx, r);

				    });

				}

				#endif

				void req_params::process(const request& req) {

				    // Process mandatory parameters

				    for (auto& [name, ent] : params) {

									
										22

api/api.hh
									
												View File
												
				@@ -27,7 +27,7 @@ template<class T>

				std::vector<sstring> container_to_vec(const T& container) {

				    std::vector<sstring> res;

				    for (auto i : container) {

				        res.push_back(boost::lexical_cast<std::string>(i));

				        res.push_back(fmt::to_string(i));

				    }

				    return res;

				}

				@@ -47,8 +47,8 @@ template<class T, class MAP>

				std::vector<T>& map_to_key_value(const MAP& map, std::vector<T>& res) {

				    for (auto i : map) {

				        T val;

				        val.key = boost::lexical_cast<std::string>(i.first);

				        val.value = boost::lexical_cast<std::string>(i.second);

				        val.key = fmt::to_string(i.first);

				        val.value = fmt::to_string(i.second);

				        res.push_back(val);

				    }

				    return res;

				@@ -65,7 +65,7 @@ template <typename MAP>

				std::vector<sstring> map_keys(const MAP& map) {

				    std::vector<sstring> res;

				    for (const auto& i : map) {

				        res.push_back(boost::lexical_cast<std::string>(i.first));

				        res.push_back(fmt::to_string(i.first));

				    }

				    return res;

				}

				@@ -137,6 +137,14 @@ future<json::json_return_type>  sum_timer_stats(distributed<T>& d, utils::timed_

				    });

				}

				template<class T, class F>

				future<json::json_return_type>  sum_timer_stats(distributed<T>& d, utils::timed_rate_moving_average_summary_and_histogram F::*f) {

				    return d.map_reduce0([f](const T& p) {return (p.get_stats().*f).rate();}, utils::rate_moving_average_and_histogram(),

				            std::plus<utils::rate_moving_average_and_histogram>()).then([](const utils::rate_moving_average_and_histogram& val) {

				        return make_ready_future<json::json_return_type>(timer_to_json(val));

				    });

				}

				inline int64_t min_int64(int64_t a, int64_t b) {

				    return std::min(a,b);

				}

				@@ -181,7 +189,7 @@ struct basic_ratio_holder : public json::jsonable {

				typedef basic_ratio_holder<double>  ratio_holder;

				typedef basic_ratio_holder<int64_t> integral_ratio_holder;

				class unimplemented_exception : public base_exception {

				class unimplemented_exception : public httpd::base_exception {

				public:

				    unimplemented_exception()

				            : base_exception("API call is not supported yet", reply::status_type::internal_server_error) {

				@@ -230,7 +238,7 @@ public:

				                value = T{boost::lexical_cast<Base>(param)};

				            }

				        } catch (boost::bad_lexical_cast&) {

				            throw bad_param_exception(format("{} ({}): type error - should be {}", name, param, boost::units::detail::demangle(typeid(Base).name())));

				            throw httpd::bad_param_exception(format("{} ({}): type error - should be {}", name, param, boost::units::detail::demangle(typeid(Base).name())));

				        }

				    }

				@@ -298,6 +306,6 @@ public:

				    }

				};

				utils_json::estimated_histogram time_to_json_histogram(const utils::time_estimated_histogram& val);

				httpd::utils_json::estimated_histogram time_to_json_histogram(const utils::time_estimated_histogram& val);

				}

									
										19

api/api_init.hh
									
												View File
												
				@@ -11,8 +11,12 @@

				#include <seastar/core/future.hh>

				#include "replica/database_fwd.hh"

				#include "tasks/task_manager.hh"

				#include "seastarx.hh"

				using request = http::request;

				using reply = http::reply;

				namespace service {

				class load_meter;

				@@ -31,6 +35,7 @@ namespace locator {

				class token_metadata;

				class shared_token_metadata;

				class snitch_ptr;

				} // namespace locator

				@@ -66,11 +71,12 @@ struct http_context {

				    distributed<service::storage_proxy>& sp;

				    service::load_meter& lmeter;

				    const sharded<locator::shared_token_metadata>& shared_token_metadata;

				    sharded<tasks::task_manager>& tm;

				    http_context(distributed<replica::database>& _db,

				            distributed<service::storage_proxy>& _sp,

				            service::load_meter& _lm, const sharded<locator::shared_token_metadata>& _stm)

				            : db(_db), sp(_sp), lmeter(_lm), shared_token_metadata(_stm) {

				            service::load_meter& _lm, const sharded<locator::shared_token_metadata>& _stm, sharded<tasks::task_manager>& _tm)

				            : db(_db), sp(_sp), lmeter(_lm), shared_token_metadata(_stm), tm(_tm) {

				    }

				    const locator::token_metadata& get_token_metadata();

				@@ -78,7 +84,8 @@ struct http_context {

				future<> set_server_init(http_context& ctx);

				future<> set_server_config(http_context& ctx, const db::config& cfg);

				future<> set_server_snitch(http_context& ctx);

				future<> set_server_snitch(http_context& ctx, sharded<locator::snitch_ptr>& snitch);

				future<> unset_server_snitch(http_context& ctx);

				future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, sharded<gms::gossiper>& g, sharded<cdc::generation_service>& cdc_gs, sharded<db::system_keyspace>& sys_ks);

				future<> set_server_sstables_loader(http_context& ctx, sharded<sstables_loader>& sst_loader);

				future<> unset_server_sstables_loader(http_context& ctx);

				@@ -95,10 +102,12 @@ future<> unset_server_authorization_cache(http_context& ctx);

				future<> set_server_snapshot(http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl);

				future<> unset_server_snapshot(http_context& ctx);

				future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g);

				future<> set_server_load_sstable(http_context& ctx);

				future<> set_server_load_sstable(http_context& ctx, sharded<db::system_keyspace>& sys_ks);

				future<> unset_server_load_sstable(http_context& ctx);

				future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms);

				future<> unset_server_messaging_service(http_context& ctx);

				future<> set_server_storage_proxy(http_context& ctx, sharded<service::storage_service>& ss);

				future<> unset_server_storage_proxy(http_context& ctx);

				future<> set_server_stream_manager(http_context& ctx, sharded<streaming::stream_manager>& sm);

				future<> unset_server_stream_manager(http_context& ctx);

				future<> set_hinted_handoff(http_context& ctx, sharded<gms::gossiper>& g);

				@@ -107,5 +116,7 @@ future<> set_server_gossip_settle(http_context& ctx, sharded<gms::gossiper>& g);

				future<> set_server_cache(http_context& ctx);

				future<> set_server_compaction_manager(http_context& ctx);

				future<> set_server_done(http_context& ctx);

				future<> set_server_task_manager(http_context& ctx, lw_shared_ptr<db::config> cfg);

				future<> set_server_task_manager_test(http_context& ctx);

				}

									
										3

api/authorization_cache.cc
									
												View File
												
				@@ -14,9 +14,10 @@

				namespace api {

				using namespace json;

				using namespace seastar::httpd;

				void set_authorization_cache(http_context& ctx, routes& r, sharded<auth::service> &auth_service) {

				    httpd::authorization_cache_json::authorization_cache_reset.set(r, [&auth_service] (std::unique_ptr<request> req) -> future<json::json_return_type> {

				    httpd::authorization_cache_json::authorization_cache_reset.set(r, [&auth_service] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        co_await auth_service.invoke_on_all([] (auth::service& auth) -> future<>  {

				            auth.reset_authorization_cache();

				            return make_ready_future<>();

									
										4

api/authorization_cache.hh
									
												View File
												
				@@ -12,7 +12,7 @@

				namespace api {

				void set_authorization_cache(http_context& ctx, routes& r, sharded<auth::service> &auth_service);

				void unset_authorization_cache(http_context& ctx, routes& r);

				void set_authorization_cache(http_context& ctx, httpd::routes& r, sharded<auth::service> &auth_service);

				void unset_authorization_cache(http_context& ctx, httpd::routes& r);

				}

									
										85

api/cache_service.cc
									
												View File
												
				@@ -12,127 +12,128 @@

				namespace api {

				using namespace json;

				using namespace seastar::httpd;

				namespace cs = httpd::cache_service_json;

				void set_cache_service(http_context& ctx, routes& r) {

				    cs::get_row_cache_save_period_in_seconds.set(r, [](std::unique_ptr<request> req) {

				    cs::get_row_cache_save_period_in_seconds.set(r, [](std::unique_ptr<http::request> req) {

				        // We never save the cache

				        // Origin uses 0 for never

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::set_row_cache_save_period_in_seconds.set(r, [](std::unique_ptr<request> req) {

				    cs::set_row_cache_save_period_in_seconds.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        auto period = req->get_query_param("period");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    cs::get_key_cache_save_period_in_seconds.set(r, [](std::unique_ptr<request> req) {

				    cs::get_key_cache_save_period_in_seconds.set(r, [](std::unique_ptr<http::request> req) {

				        // We never save the cache

				        // Origin uses 0 for never

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::set_key_cache_save_period_in_seconds.set(r, [](std::unique_ptr<request> req) {

				    cs::set_key_cache_save_period_in_seconds.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        auto period = req->get_query_param("period");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    cs::get_counter_cache_save_period_in_seconds.set(r, [](std::unique_ptr<request> req) {

				    cs::get_counter_cache_save_period_in_seconds.set(r, [](std::unique_ptr<http::request> req) {

				        // We never save the cache

				        // Origin uses 0 for never

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::set_counter_cache_save_period_in_seconds.set(r, [](std::unique_ptr<request> req) {

				    cs::set_counter_cache_save_period_in_seconds.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        auto ccspis = req->get_query_param("ccspis");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    cs::get_row_cache_keys_to_save.set(r, [](std::unique_ptr<request> req) {

				    cs::get_row_cache_keys_to_save.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::set_row_cache_keys_to_save.set(r, [](std::unique_ptr<request> req) {

				    cs::set_row_cache_keys_to_save.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        auto rckts = req->get_query_param("rckts");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    cs::get_key_cache_keys_to_save.set(r, [](std::unique_ptr<request> req) {

				    cs::get_key_cache_keys_to_save.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::set_key_cache_keys_to_save.set(r, [](std::unique_ptr<request> req) {

				    cs::set_key_cache_keys_to_save.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        auto kckts = req->get_query_param("kckts");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    cs::get_counter_cache_keys_to_save.set(r, [](std::unique_ptr<request> req) {

				    cs::get_counter_cache_keys_to_save.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::set_counter_cache_keys_to_save.set(r, [](std::unique_ptr<request> req) {

				    cs::set_counter_cache_keys_to_save.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        auto cckts = req->get_query_param("cckts");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    cs::invalidate_key_cache.set(r, [](std::unique_ptr<request> req) {

				    cs::invalidate_key_cache.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    cs::invalidate_counter_cache.set(r, [](std::unique_ptr<request> req) {

				    cs::invalidate_counter_cache.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    cs::set_row_cache_capacity_in_mb.set(r, [](std::unique_ptr<request> req) {

				    cs::set_row_cache_capacity_in_mb.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        auto capacity = req->get_query_param("capacity");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    cs::set_key_cache_capacity_in_mb.set(r, [](std::unique_ptr<request> req) {

				    cs::set_key_cache_capacity_in_mb.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        auto period = req->get_query_param("period");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    cs::set_counter_cache_capacity_in_mb.set(r, [](std::unique_ptr<request> req) {

				    cs::set_counter_cache_capacity_in_mb.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        auto capacity = req->get_query_param("capacity");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    cs::save_caches.set(r, [](std::unique_ptr<request> req) {

				    cs::save_caches.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    cs::get_key_capacity.set(r, [] (std::unique_ptr<request> req) {

				    cs::get_key_capacity.set(r, [] (std::unique_ptr<http::request> req) {

				        // TBD

				        // FIXME

				        // we don't support keys cache,

				@@ -140,7 +141,7 @@ void set_cache_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::get_key_hits.set(r, [] (std::unique_ptr<request> req) {

				    cs::get_key_hits.set(r, [] (std::unique_ptr<http::request> req) {

				        // TBD

				        // FIXME

				        // we don't support keys cache,

				@@ -148,7 +149,7 @@ void set_cache_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::get_key_requests.set(r, [] (std::unique_ptr<request> req) {

				    cs::get_key_requests.set(r, [] (std::unique_ptr<http::request> req) {

				        // TBD

				        // FIXME

				        // we don't support keys cache,

				@@ -156,7 +157,7 @@ void set_cache_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::get_key_hit_rate.set(r, [] (std::unique_ptr<request> req) {

				    cs::get_key_hit_rate.set(r, [] (std::unique_ptr<http::request> req) {

				        // TBD

				        // FIXME

				        // we don't support keys cache,

				@@ -164,21 +165,21 @@ void set_cache_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::get_key_hits_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cs::get_key_hits_moving_avrage.set(r, [] (std::unique_ptr<http::request> req) {

				        // TBD

				        // FIXME

				        // See above

				        return make_ready_future<json::json_return_type>(meter_to_json(utils::rate_moving_average()));

				    });

				    cs::get_key_requests_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cs::get_key_requests_moving_avrage.set(r, [] (std::unique_ptr<http::request> req) {

				        // TBD

				        // FIXME

				        // See above

				        return make_ready_future<json::json_return_type>(meter_to_json(utils::rate_moving_average()));

				    });

				    cs::get_key_size.set(r, [] (std::unique_ptr<request> req) {

				    cs::get_key_size.set(r, [] (std::unique_ptr<http::request> req) {

				        // TBD

				        // FIXME

				        // we don't support keys cache,

				@@ -186,7 +187,7 @@ void set_cache_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::get_key_entries.set(r, [] (std::unique_ptr<request> req) {

				    cs::get_key_entries.set(r, [] (std::unique_ptr<http::request> req) {

				        // TBD

				        // FIXME

				        // we don't support keys cache,

				@@ -194,7 +195,7 @@ void set_cache_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::get_row_capacity.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cs::get_row_capacity.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {

				            return db.row_cache_tracker().region().occupancy().used_space();

				        }, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {

				@@ -202,26 +203,26 @@ void set_cache_service(http_context& ctx, routes& r) {

				        });

				    });

				    cs::get_row_hits.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cs::get_row_hits.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, uint64_t(0), [](const replica::column_family& cf) {

				            return cf.get_row_cache().stats().hits.count();

				        }, std::plus<uint64_t>());

				    });

				    cs::get_row_requests.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cs::get_row_requests.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, uint64_t(0), [](const replica::column_family& cf) {

				            return cf.get_row_cache().stats().hits.count() + cf.get_row_cache().stats().misses.count();

				        }, std::plus<uint64_t>());

				    });

				    cs::get_row_hit_rate.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cs::get_row_hit_rate.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, ratio_holder(), [](const replica::column_family& cf) {

				            return ratio_holder(cf.get_row_cache().stats().hits.count() + cf.get_row_cache().stats().misses.count(),

				                    cf.get_row_cache().stats().hits.count());

				        }, std::plus<ratio_holder>());

				    });

				    cs::get_row_hits_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cs::get_row_hits_moving_avrage.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const replica::column_family& cf) {

				            return cf.get_row_cache().stats().hits.rate();

				        }, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {

				@@ -229,7 +230,7 @@ void set_cache_service(http_context& ctx, routes& r) {

				        });

				    });

				    cs::get_row_requests_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cs::get_row_requests_moving_avrage.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const replica::column_family& cf) {

				            return cf.get_row_cache().stats().hits.rate() + cf.get_row_cache().stats().misses.rate();

				        }, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {

				@@ -237,7 +238,7 @@ void set_cache_service(http_context& ctx, routes& r) {

				        });

				    });

				    cs::get_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cs::get_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        // In origin row size is the weighted size.

				        // We currently do not support weights, so we use num entries instead

				        return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {

				@@ -247,7 +248,7 @@ void set_cache_service(http_context& ctx, routes& r) {

				        });

				    });

				    cs::get_row_entries.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cs::get_row_entries.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {

				            return db.row_cache_tracker().partitions();

				        }, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {

				@@ -255,7 +256,7 @@ void set_cache_service(http_context& ctx, routes& r) {

				        });

				    });

				    cs::get_counter_capacity.set(r, [] (std::unique_ptr<request> req) {

				    cs::get_counter_capacity.set(r, [] (std::unique_ptr<http::request> req) {

				        // TBD

				        // FIXME

				        // we don't support counter cache,

				@@ -263,7 +264,7 @@ void set_cache_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::get_counter_hits.set(r, [] (std::unique_ptr<request> req) {

				    cs::get_counter_hits.set(r, [] (std::unique_ptr<http::request> req) {

				        // TBD

				        // FIXME

				        // we don't support counter cache,

				@@ -271,7 +272,7 @@ void set_cache_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::get_counter_requests.set(r, [] (std::unique_ptr<request> req) {

				    cs::get_counter_requests.set(r, [] (std::unique_ptr<http::request> req) {

				        // TBD

				        // FIXME

				        // we don't support counter cache,

				@@ -279,7 +280,7 @@ void set_cache_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::get_counter_hit_rate.set(r, [] (std::unique_ptr<request> req) {

				    cs::get_counter_hit_rate.set(r, [] (std::unique_ptr<http::request> req) {

				        // TBD

				        // FIXME

				        // we don't support counter cache,

				@@ -287,21 +288,21 @@ void set_cache_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::get_counter_hits_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cs::get_counter_hits_moving_avrage.set(r, [] (std::unique_ptr<http::request> req) {

				        // TBD

				        // FIXME

				        // See above

				        return make_ready_future<json::json_return_type>(meter_to_json(utils::rate_moving_average()));

				    });

				    cs::get_counter_requests_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cs::get_counter_requests_moving_avrage.set(r, [] (std::unique_ptr<http::request> req) {

				        // TBD

				        // FIXME

				        // See above

				        return make_ready_future<json::json_return_type>(meter_to_json(utils::rate_moving_average()));

				    });

				    cs::get_counter_size.set(r, [] (std::unique_ptr<request> req) {

				    cs::get_counter_size.set(r, [] (std::unique_ptr<http::request> req) {

				        // TBD

				        // FIXME

				        // we don't support counter cache,

				@@ -309,7 +310,7 @@ void set_cache_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::get_counter_entries.set(r, [] (std::unique_ptr<request> req) {

				    cs::get_counter_entries.set(r, [] (std::unique_ptr<http::request> req) {

				        // TBD

				        // FIXME

				        // we don't support counter cache,

									
										2

api/cache_service.hh
									
												View File
												
				@@ -12,6 +12,6 @@

				namespace api {

				void set_cache_service(http_context& ctx, routes& r);

				void set_cache_service(http_context& ctx, httpd::routes& r);

				}

									
										2

api/collectd.cc
									
												View File
												
				@@ -52,7 +52,7 @@ static const char* str_to_regex(const sstring& v) {

				}

				void set_collectd(http_context& ctx, routes& r) {

				    cd::get_collectd.set(r, [&ctx](std::unique_ptr<request> req) {

				    cd::get_collectd.set(r, [](std::unique_ptr<request> req) {

				        auto id = ::make_shared<scollectd::type_instance_id>(req->param["pluginid"],

				                req->get_query_param("instance"), req->get_query_param("type"),

									
										2

api/collectd.hh
									
												View File
												
				@@ -12,6 +12,6 @@

				namespace api {

				void set_collectd(http_context& ctx, routes& r);

				void set_collectd(http_context& ctx, httpd::routes& r);

				}

									
										368

api/column_family.cc
									
												View File
												
				@@ -14,9 +14,10 @@

				#include "sstables/metadata_collector.hh"

				#include "utils/estimated_histogram.hh"

				#include <algorithm>

				#include "db/system_keyspace_view_types.hh"

				#include "db/system_keyspace.hh"

				#include "db/data_listeners.hh"

				#include "storage_service.hh"

				#include "compaction/compaction_manager.hh"

				#include "unimplemented.hh"

				extern logging::logger apilog;

				@@ -24,7 +25,6 @@ extern logging::logger apilog;

				namespace api {

				using namespace httpd;

				using namespace std;

				using namespace json;

				namespace cf = httpd::column_family_json;

				@@ -43,7 +43,7 @@ std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {

				    return std::make_tuple(name.substr(0, pos), name.substr(end));

				}

				const utils::UUID& get_uuid(const sstring& ks, const sstring& cf, const replica::database& db) {

				const table_id& get_uuid(const sstring& ks, const sstring& cf, const replica::database& db) {

				    try {

				        return db.find_uuid(ks, cf);

				    } catch (replica::no_such_column_family& e) {

				@@ -51,12 +51,12 @@ const utils::UUID& get_uuid(const sstring& ks, const sstring& cf, const replica:

				    }

				}

				const utils::UUID& get_uuid(const sstring& name, const replica::database& db) {

				const table_id& get_uuid(const sstring& name, const replica::database& db) {

				    auto [ks, cf] = parse_fully_qualified_cf_name(name);

				    return get_uuid(ks, cf, db);

				}

				future<> foreach_column_family(http_context& ctx, const sstring& name, function<void(replica::column_family&)> f) {

				future<> foreach_column_family(http_context& ctx, const sstring& name, std::function<void(replica::column_family&)> f) {

				    auto uuid = get_uuid(name, ctx.db.local());

				    return ctx.db.invoke_on_all([f, uuid](replica::database& db) {

				@@ -110,7 +110,7 @@ static future<json::json_return_type>  get_cf_stats_count(http_context& ctx,

				static future<json::json_return_type>  get_cf_histogram(http_context& ctx, const sstring& name,

				        utils::timed_rate_moving_average_and_histogram replica::column_family_stats::*f) {

				    utils::UUID uuid = get_uuid(name, ctx.db.local());

				    auto uuid = get_uuid(name, ctx.db.local());

				    return ctx.db.map_reduce0([f, uuid](const replica::database& p) {

				        return (p.find_column_family(uuid).get_stats().*f).hist;},

				            utils::ihistogram(),

				@@ -122,7 +122,7 @@ static future<json::json_return_type>  get_cf_histogram(http_context& ctx, const

				static future<json::json_return_type>  get_cf_histogram(http_context& ctx, const sstring& name,

				        utils::timed_rate_moving_average_summary_and_histogram replica::column_family_stats::*f) {

				    utils::UUID uuid = get_uuid(name, ctx.db.local());

				    auto uuid = get_uuid(name, ctx.db.local());

				    return ctx.db.map_reduce0([f, uuid](const replica::database& p) {

				        return (p.find_column_family(uuid).get_stats().*f).hist;},

				            utils::ihistogram(),

				@@ -149,7 +149,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils:

				static future<json::json_return_type>  get_cf_rate_and_histogram(http_context& ctx, const sstring& name,

				        utils::timed_rate_moving_average_summary_and_histogram replica::column_family_stats::*f) {

				    utils::UUID uuid = get_uuid(name, ctx.db.local());

				    auto uuid = get_uuid(name, ctx.db.local());

				    return ctx.db.map_reduce0([f, uuid](const replica::database& p) {

				        return (p.find_column_family(uuid).get_stats().*f).rate();},

				            utils::rate_moving_average_and_histogram(),

				@@ -303,16 +303,16 @@ ratio_holder filter_recent_false_positive_as_ratio_holder(const sstables::shared

				    return ratio_holder(f + sst->filter_get_recent_true_positive(), f);

				}

				void set_column_family(http_context& ctx, routes& r) {

				void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace>& sys_ks) {

				    cf::get_column_family_name.set(r, [&ctx] (const_req req){

				        vector<sstring> res;

				        std::vector<sstring> res;

				        for (auto i: ctx.db.local().get_column_families_mapping()) {

				            res.push_back(i.first.first + ":" + i.first.second);

				        }

				        return res;

				    });

				    cf::get_column_family.set(r, [&ctx] (std::unique_ptr<request> req){

				    cf::get_column_family.set(r, [&ctx] (std::unique_ptr<http::request> req){

				            std::list<cf::column_family_info> res;

				            for (auto i: ctx.db.local().get_column_families_mapping()) {

				                cf::column_family_info info;

				@@ -325,22 +325,22 @@ void set_column_family(http_context& ctx, routes& r) {

				        });

				    cf::get_column_family_name_keyspace.set(r, [&ctx] (const_req req){

				        vector<sstring> res;

				        std::vector<sstring> res;

				        for (auto i = ctx.db.local().get_keyspaces().cbegin(); i!=  ctx.db.local().get_keyspaces().cend(); i++) {

				            res.push_back(i->first);

				        }

				        return res;

				    });

				    cf::get_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], uint64_t{0}, [](replica::column_family& cf) {

				            return cf.active_memtable().partition_count();

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed(std::mem_fn(&replica::memtable::partition_count)), uint64_t(0));

				        }, std::plus<>());

				    });

				    cf::get_all_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, uint64_t{0}, [](replica::column_family& cf) {

				            return cf.active_memtable().partition_count();

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed(std::mem_fn(&replica::memtable::partition_count)), uint64_t(0));

				        }, std::plus<>());

				    });

				@@ -352,27 +352,35 @@ void set_column_family(http_context& ctx, routes& r) {

				        return 0;

				    });

				    cf::get_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {

				            return cf.active_memtable().region().occupancy().total_space();

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().total_space();

				            }), uint64_t(0));

				        }, std::plus<int64_t>());

				    });

				    cf::get_all_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {

				            return cf.active_memtable().region().occupancy().total_space();

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().total_space();

				            }), uint64_t(0));

				        }, std::plus<int64_t>());

				    });

				    cf::get_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {

				            return cf.active_memtable().region().occupancy().used_space();

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().used_space();

				            }), uint64_t(0));

				        }, std::plus<int64_t>());

				    });

				    cf::get_all_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {

				            return cf.active_memtable().region().occupancy().used_space();

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().used_space();

				            }), uint64_t(0));

				        }, std::plus<int64_t>());

				    });

				@@ -384,46 +392,48 @@ void set_column_family(http_context& ctx, routes& r) {

				        return 0;

				    });

				    cf::get_cf_all_memtables_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_cf_all_memtables_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        warn(unimplemented::cause::INDEXES);

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {

				            return cf.occupancy().total_space();

				        }, std::plus<int64_t>());

				    });

				    cf::get_all_cf_all_memtables_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_cf_all_memtables_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        warn(unimplemented::cause::INDEXES);

				        return ctx.db.map_reduce0([](const replica::database& db){

				            return db.dirty_memory_region_group().memory_used();

				            return db.dirty_memory_region_group().real_memory_used();

				        }, int64_t(0), std::plus<int64_t>()).then([](int res) {

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    cf::get_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        warn(unimplemented::cause::INDEXES);

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {

				            return cf.occupancy().used_space();

				        }, std::plus<int64_t>());

				    });

				    cf::get_all_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        warn(unimplemented::cause::INDEXES);

				        return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {

				            return cf.active_memtable().region().occupancy().used_space();

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().used_space();

				            }), uint64_t(0));

				        }, std::plus<int64_t>());

				    });

				    cf::get_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats(ctx,req->param["name"] ,&replica::column_family_stats::memtable_switch_count);

				    });

				    cf::get_all_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats(ctx, &replica::column_family_stats::memtable_switch_count);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_estimated_row_size_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_estimated_row_size_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](replica::column_family& cf) {

				            utils::estimated_histogram res(0);

				            for (auto sstables = cf.get_sstables(); auto& i : *sstables) {

				@@ -435,7 +445,7 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_estimated_row_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_estimated_row_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {

				            uint64_t res = 0;

				            for (auto sstables = cf.get_sstables(); auto& i : *sstables) {

				@@ -446,7 +456,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        std::plus<uint64_t>());

				    });

				    cf::get_estimated_column_count_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_estimated_column_count_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](replica::column_family& cf) {

				            utils::estimated_histogram res(0);

				            for (auto sstables = cf.get_sstables(); auto& i : *sstables) {

				@@ -457,149 +467,149 @@ void set_column_family(http_context& ctx, routes& r) {

				        utils::estimated_histogram_merge, utils_json::estimated_histogram());

				    });

				    cf::get_all_compression_ratio.set(r, [] (std::unique_ptr<request> req) {

				    cf::get_all_compression_ratio.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cf::get_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_pending_flushes.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats(ctx,req->param["name"] ,&replica::column_family_stats::pending_flushes);

				    });

				    cf::get_all_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_pending_flushes.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats(ctx, &replica::column_family_stats::pending_flushes);

				    });

				    cf::get_read.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_read.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats_count(ctx,req->param["name"] ,&replica::column_family_stats::reads);

				    });

				    cf::get_all_read.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_read.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats_count(ctx, &replica::column_family_stats::reads);

				    });

				    cf::get_write.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_write.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats_count(ctx, req->param["name"] ,&replica::column_family_stats::writes);

				    });

				    cf::get_all_write.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_write.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats_count(ctx, &replica::column_family_stats::writes);

				    });

				    cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_histogram(ctx, req->param["name"], &replica::column_family_stats::reads);

				    });

				    cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_rate_and_histogram(ctx, req->param["name"], &replica::column_family_stats::reads);

				    });

				    cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats_sum(ctx,req->param["name"] ,&replica::column_family_stats::reads);

				    });

				    cf::get_write_latency.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_write_latency.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats_sum(ctx, req->param["name"] ,&replica::column_family_stats::writes);

				    });

				    cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_histogram(ctx, &replica::column_family_stats::writes);

				    });

				    cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_rate_and_histogram(ctx, &replica::column_family_stats::writes);

				    });

				    cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_histogram(ctx, req->param["name"], &replica::column_family_stats::writes);

				    });

				    cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_rate_and_histogram(ctx, req->param["name"], &replica::column_family_stats::writes);

				    });

				    cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_histogram(ctx, &replica::column_family_stats::writes);

				    });

				    cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_rate_and_histogram(ctx, &replica::column_family_stats::writes);

				    });

				    cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {

				            return cf.get_compaction_strategy().estimated_pending_compactions(cf.as_table_state());

				            return cf.estimate_pending_compactions();

				        }, std::plus<int64_t>());

				    });

				    cf::get_all_pending_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_pending_compactions.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {

				            return cf.get_compaction_strategy().estimated_pending_compactions(cf.as_table_state());

				            return cf.estimate_pending_compactions();

				        }, std::plus<int64_t>());

				    });

				    cf::get_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats(ctx, req->param["name"], &replica::column_family_stats::live_sstable_count);

				    });

				    cf::get_all_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats(ctx, &replica::column_family_stats::live_sstable_count);

				    });

				    cf::get_unleveled_sstables.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_unleveled_sstables.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_unleveled_sstables(ctx, req->param["name"]);

				    });

				    cf::get_live_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_live_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return sum_sstable(ctx, req->param["name"], false);

				    });

				    cf::get_all_live_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_live_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return sum_sstable(ctx, false);

				    });

				    cf::get_total_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_total_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return sum_sstable(ctx, req->param["name"], true);

				    });

				    cf::get_all_total_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_total_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return sum_sstable(ctx, true);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_min_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_min_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], INT64_MAX, min_partition_size, min_int64);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_all_min_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_min_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, INT64_MAX, min_partition_size, min_int64);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_max_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_partition_size, max_int64);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_all_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_max_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, int64_t(0), max_partition_size, max_int64);

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_mean_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        // Cassandra 3.x mean values are truncated as integrals.

				        return map_reduce_cf(ctx, req->param["name"], integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());

				    });

				    // FIXME: this refers to partitions, not rows.

				    cf::get_all_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_mean_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        // Cassandra 3.x mean values are truncated as integrals.

				        return map_reduce_cf(ctx, integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());

				    });

				    cf::get_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {

				            auto sstables = cf.get_sstables();

				            return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				@@ -608,7 +618,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        }, std::plus<uint64_t>());

				    });

				    cf::get_all_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, uint64_t(0), [] (replica::column_family& cf) {

				            auto sstables = cf.get_sstables();

				            return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				@@ -617,7 +627,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        }, std::plus<uint64_t>());

				    });

				    cf::get_recent_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_recent_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {

				            auto sstables = cf.get_sstables();

				            return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				@@ -626,7 +636,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        }, std::plus<uint64_t>());

				    });

				    cf::get_all_recent_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_recent_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, uint64_t(0), [] (replica::column_family& cf) {

				            auto sstables = cf.get_sstables();

				            return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				@@ -635,31 +645,31 @@ void set_column_family(http_context& ctx, routes& r) {

				        }, std::plus<uint64_t>());

				    });

				    cf::get_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], ratio_holder(), [] (replica::column_family& cf) {

				            return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_false_positive_as_ratio_holder), ratio_holder());

				        }, std::plus<>());

				    });

				    cf::get_all_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, ratio_holder(), [] (replica::column_family& cf) {

				            return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_false_positive_as_ratio_holder), ratio_holder());

				        }, std::plus<>());

				    });

				    cf::get_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], ratio_holder(), [] (replica::column_family& cf) {

				            return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_recent_false_positive_as_ratio_holder), ratio_holder());

				        }, std::plus<>());

				    });

				    cf::get_all_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, ratio_holder(), [] (replica::column_family& cf) {

				            return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_recent_false_positive_as_ratio_holder), ratio_holder());

				        }, std::plus<>());

				    });

				    cf::get_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {

				            auto sstables = cf.get_sstables();

				            return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				@@ -668,7 +678,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        }, std::plus<uint64_t>());

				    });

				    cf::get_all_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, uint64_t(0), [] (replica::column_family& cf) {

				            auto sstables = cf.get_sstables();

				            return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				@@ -677,7 +687,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        }, std::plus<uint64_t>());

				    });

				    cf::get_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {

				            auto sstables = cf.get_sstables();

				            return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				@@ -686,7 +696,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        }, std::plus<uint64_t>());

				    });

				    cf::get_all_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, uint64_t(0), [] (replica::column_family& cf) {

				            auto sstables = cf.get_sstables();

				            return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				@@ -695,7 +705,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        }, std::plus<uint64_t>());

				    });

				    cf::get_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {

				            auto sstables = cf.get_sstables();

				            return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				@@ -704,7 +714,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        }, std::plus<uint64_t>());

				    });

				    cf::get_all_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, uint64_t(0), [] (replica::column_family& cf) {

				            auto sstables = cf.get_sstables();

				            return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				@@ -713,7 +723,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        }, std::plus<uint64_t>());

				    });

				    cf::get_compression_metadata_off_heap_memory_used.set(r, [] (std::unique_ptr<request> req) {

				    cf::get_compression_metadata_off_heap_memory_used.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        // FIXME

				        // We are missing the off heap memory calculation

				@@ -723,33 +733,33 @@ void set_column_family(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cf::get_all_compression_metadata_off_heap_memory_used.set(r, [] (std::unique_ptr<request> req) {

				    cf::get_all_compression_metadata_off_heap_memory_used.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cf::get_speculative_retries.set(r, [] (std::unique_ptr<request> req) {

				    cf::get_speculative_retries.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        //auto id = get_uuid(req->param["name"], ctx.db.local());

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cf::get_all_speculative_retries.set(r, [] (std::unique_ptr<request> req) {

				    cf::get_all_speculative_retries.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cf::get_key_cache_hit_rate.set(r, [] (std::unique_ptr<request> req) {

				    cf::get_key_cache_hit_rate.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        //auto id = get_uuid(req->param["name"], ctx.db.local());

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cf::get_true_snapshots_size.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_true_snapshots_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        auto uuid = get_uuid(req->param["name"], ctx.db.local());

				        return ctx.db.local().find_column_family(uuid).get_snapshot_details().then([](

				                const std::unordered_map<sstring, replica::column_family::snapshot_details>& sd) {

				@@ -761,26 +771,26 @@ void set_column_family(http_context& ctx, routes& r) {

				        });

				    });

				    cf::get_all_true_snapshots_size.set(r, [] (std::unique_ptr<request> req) {

				    cf::get_all_true_snapshots_size.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cf::get_row_cache_hit_out_of_range.set(r, [] (std::unique_ptr<request> req) {

				    cf::get_row_cache_hit_out_of_range.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        //auto id = get_uuid(req->param["name"], ctx.db.local());

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cf::get_all_row_cache_hit_out_of_range.set(r, [] (std::unique_ptr<request> req) {

				    cf::get_all_row_cache_hit_out_of_range.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cf::get_row_cache_hit.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_row_cache_hit.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf_raw(ctx, req->param["name"], utils::rate_moving_average(), [](const replica::column_family& cf) {

				            return cf.get_row_cache().stats().hits.rate();

				        }, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {

				@@ -788,7 +798,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        });

				    });

				    cf::get_all_row_cache_hit.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_row_cache_hit.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const replica::column_family& cf) {

				            return cf.get_row_cache().stats().hits.rate();

				        }, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {

				@@ -796,7 +806,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        });

				    });

				    cf::get_row_cache_miss.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_row_cache_miss.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf_raw(ctx, req->param["name"], utils::rate_moving_average(), [](const replica::column_family& cf) {

				            return cf.get_row_cache().stats().misses.rate();

				        }, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {

				@@ -804,7 +814,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        });

				    });

				    cf::get_all_row_cache_miss.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_all_row_cache_miss.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const replica::column_family& cf) {

				            return cf.get_row_cache().stats().misses.rate();

				        }, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {

				@@ -813,40 +823,40 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {

				            return cf.get_stats().cas_prepare.histogram();

				        });

				    });

				    cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {

				            return cf.get_stats().cas_accept.histogram();

				        });

				    });

				    cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {

				            return cf.get_stats().cas_learn.histogram();

				        });

				    });

				    cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](replica::column_family& cf) {

				            return cf.get_stats().estimated_sstable_per_read;

				        },

				        utils::estimated_histogram_merge, utils_json::estimated_histogram());

				    });

				    cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_histogram(ctx, req->param["name"], &replica::column_family_stats::tombstone_scanned);

				    });

				    cf::get_live_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::get_live_scanned_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_histogram(ctx, req->param["name"], &replica::column_family_stats::live_scanned);

				    });

				    cf::get_col_update_time_delta_histogram.set(r, [] (std::unique_ptr<request> req) {

				    cf::get_col_update_time_delta_histogram.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        //auto id = get_uuid(req->param["name"], ctx.db.local());

				@@ -855,12 +865,12 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    cf::get_auto_compaction.set(r, [&ctx] (const_req req) {

				        const utils::UUID& uuid = get_uuid(req.param["name"], ctx.db.local());

				        auto uuid = get_uuid(req.param["name"], ctx.db.local());

				        replica::column_family& cf = ctx.db.local().find_column_family(uuid);

				        return !cf.is_auto_compaction_disabled_by_user();

				    });

				    cf::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {

				    cf::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return ctx.db.invoke_on(0, [&ctx, req = std::move(req)] (replica::database& db) {

				            auto g = replica::database::autocompaction_toggle_guard(db);

				            return foreach_column_family(ctx, req->param["name"], [](replica::column_family &cf) {

				@@ -871,7 +881,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        });

				    });

				    cf::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<request> req) {

				    cf::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return ctx.db.invoke_on(0, [&ctx, req = std::move(req)] (replica::database& db) {

				            auto g = replica::database::autocompaction_toggle_guard(db);

				            return foreach_column_family(ctx, req->param["name"], [](replica::column_family &cf) {

				@@ -882,11 +892,11 @@ void set_column_family(http_context& ctx, routes& r) {

				        });

				    });

				    cf::get_built_indexes.set(r, [&ctx](std::unique_ptr<request> req) {

				    cf::get_built_indexes.set(r, [&ctx, &sys_ks](std::unique_ptr<http::request> req) {

				        auto ks_cf = parse_fully_qualified_cf_name(req->param["name"]);

				        auto&& ks = std::get<0>(ks_cf);

				        auto&& cf_name = std::get<1>(ks_cf);

				        return db::system_keyspace::load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace_view_build_progress>& vb) mutable {

				        return sys_ks.local().load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace_view_build_progress>& vb) mutable {

				            std::set<sstring> vp;

				            for (auto b : vb) {

				                if (b.view.first == ks) {

				@@ -920,7 +930,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        return std::vector<sstring>();

				    });

				    cf::get_compression_ratio.set(r, [&ctx](std::unique_ptr<request> req) {

				    cf::get_compression_ratio.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto uuid = get_uuid(req->param["name"], ctx.db.local());

				        return ctx.db.map_reduce(sum_ratio<double>(), [uuid](replica::database& db) {

				@@ -931,19 +941,19 @@ void set_column_family(http_context& ctx, routes& r) {

				        });

				    });

				    cf::get_read_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				    cf::get_read_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {

				            return cf.get_stats().reads.histogram();

				        });

				    });

				    cf::get_write_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				    cf::get_write_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {

				            return cf.get_stats().writes.histogram();

				        });

				    });

				    cf::set_compaction_strategy_class.set(r, [&ctx](std::unique_ptr<request> req) {

				    cf::set_compaction_strategy_class.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        sstring strategy = req->get_query_param("class_name");

				        return foreach_column_family(ctx, req->param["name"], [strategy](replica::column_family& cf) {

				            cf.set_compaction_strategy(sstables::compaction_strategy::type(strategy));

				@@ -956,19 +966,19 @@ void set_column_family(http_context& ctx, routes& r) {

				        return ctx.db.local().find_column_family(get_uuid(req.param["name"], ctx.db.local())).get_compaction_strategy().name();

				    });

				    cf::set_compression_parameters.set(r, [&ctx](std::unique_ptr<request> req) {

				    cf::set_compression_parameters.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    cf::set_crc_check_chance.set(r, [&ctx](std::unique_ptr<request> req) {

				    cf::set_crc_check_chance.set(r, [](std::unique_ptr<http::request> req) {

				        // TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    cf::get_sstable_count_per_level.set(r, [&ctx](std::unique_ptr<request> req) {

				    cf::get_sstable_count_per_level.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return map_reduce_cf_raw(ctx, req->param["name"], std::vector<uint64_t>(), [](const replica::column_family& cf) {

				            return cf.sstable_count_per_level();

				        }, concat_sstable_count_per_level).then([](const std::vector<uint64_t>& res) {

				@@ -976,7 +986,7 @@ void set_column_family(http_context& ctx, routes& r) {

				        });

				    });

				    cf::get_sstables_for_key.set(r, [&ctx](std::unique_ptr<request> req) {

				    cf::get_sstables_for_key.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto key = req->get_query_param("key");

				        auto uuid = get_uuid(req->param["name"], ctx.db.local());

				@@ -992,7 +1002,7 @@ void set_column_family(http_context& ctx, routes& r) {

				    });

				    cf::toppartitions.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cf::toppartitions.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        auto name = req->param["name"];

				        auto [ks, cf] = parse_fully_qualified_cf_name(name);

				@@ -1008,15 +1018,127 @@ void set_column_family(http_context& ctx, routes& r) {

				        });

				    });

				    cf::force_major_compaction.set(r, [&ctx](std::unique_ptr<request> req) {

				    cf::force_major_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        if (req->get_query_param("split_output") != "") {

				            fail(unimplemented::cause::API);

				        }

				        return foreach_column_family(ctx, req->param["name"], [](replica::column_family &cf) {

				            return cf.compact_all_sstables();

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				        auto [ks, cf] = parse_fully_qualified_cf_name(req->param["name"]);

				        auto keyspace = validate_keyspace(ctx, ks);

				        std::vector<table_id> table_infos = {ctx.db.local().find_uuid(ks, cf)};

				        auto& compaction_module = ctx.db.local().get_compaction_manager().get_task_manager_module();

				        auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), ctx.db, std::move(table_infos));

				        co_await task->done();

				        co_return json_void();

				    });

				}

				void unset_column_family(http_context& ctx, routes& r) {

				    cf::get_column_family_name.unset(r);

				    cf::get_column_family.unset(r);

				    cf::get_column_family_name_keyspace.unset(r);

				    cf::get_memtable_columns_count.unset(r);

				    cf::get_all_memtable_columns_count.unset(r);

				    cf::get_memtable_on_heap_size.unset(r);

				    cf::get_all_memtable_on_heap_size.unset(r);

				    cf::get_memtable_off_heap_size.unset(r);

				    cf::get_all_memtable_off_heap_size.unset(r);

				    cf::get_memtable_live_data_size.unset(r);

				    cf::get_all_memtable_live_data_size.unset(r);

				    cf::get_cf_all_memtables_on_heap_size.unset(r);

				    cf::get_all_cf_all_memtables_on_heap_size.unset(r);

				    cf::get_cf_all_memtables_off_heap_size.unset(r);

				    cf::get_all_cf_all_memtables_off_heap_size.unset(r);

				    cf::get_cf_all_memtables_live_data_size.unset(r);

				    cf::get_all_cf_all_memtables_live_data_size.unset(r);

				    cf::get_memtable_switch_count.unset(r);

				    cf::get_all_memtable_switch_count.unset(r);

				    cf::get_estimated_row_size_histogram.unset(r);

				    cf::get_estimated_row_count.unset(r);

				    cf::get_estimated_column_count_histogram.unset(r);

				    cf::get_all_compression_ratio.unset(r);

				    cf::get_pending_flushes.unset(r);

				    cf::get_all_pending_flushes.unset(r);

				    cf::get_read.unset(r);

				    cf::get_all_read.unset(r);

				    cf::get_write.unset(r);

				    cf::get_all_write.unset(r);

				    cf::get_read_latency_histogram_depricated.unset(r);

				    cf::get_read_latency_histogram.unset(r);

				    cf::get_read_latency.unset(r);

				    cf::get_write_latency.unset(r);

				    cf::get_all_read_latency_histogram_depricated.unset(r);

				    cf::get_all_read_latency_histogram.unset(r);

				    cf::get_write_latency_histogram_depricated.unset(r);

				    cf::get_write_latency_histogram.unset(r);

				    cf::get_all_write_latency_histogram_depricated.unset(r);

				    cf::get_all_write_latency_histogram.unset(r);

				    cf::get_pending_compactions.unset(r);

				    cf::get_all_pending_compactions.unset(r);

				    cf::get_live_ss_table_count.unset(r);

				    cf::get_all_live_ss_table_count.unset(r);

				    cf::get_unleveled_sstables.unset(r);

				    cf::get_live_disk_space_used.unset(r);

				    cf::get_all_live_disk_space_used.unset(r);

				    cf::get_total_disk_space_used.unset(r);

				    cf::get_all_total_disk_space_used.unset(r);

				    cf::get_min_row_size.unset(r);

				    cf::get_all_min_row_size.unset(r);

				    cf::get_max_row_size.unset(r);

				    cf::get_all_max_row_size.unset(r);

				    cf::get_mean_row_size.unset(r);

				    cf::get_all_mean_row_size.unset(r);

				    cf::get_bloom_filter_false_positives.unset(r);

				    cf::get_all_bloom_filter_false_positives.unset(r);

				    cf::get_recent_bloom_filter_false_positives.unset(r);

				    cf::get_all_recent_bloom_filter_false_positives.unset(r);

				    cf::get_bloom_filter_false_ratio.unset(r);

				    cf::get_all_bloom_filter_false_ratio.unset(r);

				    cf::get_recent_bloom_filter_false_ratio.unset(r);

				    cf::get_all_recent_bloom_filter_false_ratio.unset(r);

				    cf::get_bloom_filter_disk_space_used.unset(r);

				    cf::get_all_bloom_filter_disk_space_used.unset(r);

				    cf::get_bloom_filter_off_heap_memory_used.unset(r);

				    cf::get_all_bloom_filter_off_heap_memory_used.unset(r);

				    cf::get_index_summary_off_heap_memory_used.unset(r);

				    cf::get_all_index_summary_off_heap_memory_used.unset(r);

				    cf::get_compression_metadata_off_heap_memory_used.unset(r);

				    cf::get_all_compression_metadata_off_heap_memory_used.unset(r);

				    cf::get_speculative_retries.unset(r);

				    cf::get_all_speculative_retries.unset(r);

				    cf::get_key_cache_hit_rate.unset(r);

				    cf::get_true_snapshots_size.unset(r);

				    cf::get_all_true_snapshots_size.unset(r);

				    cf::get_row_cache_hit_out_of_range.unset(r);

				    cf::get_all_row_cache_hit_out_of_range.unset(r);

				    cf::get_row_cache_hit.unset(r);

				    cf::get_all_row_cache_hit.unset(r);

				    cf::get_row_cache_miss.unset(r);

				    cf::get_all_row_cache_miss.unset(r);

				    cf::get_cas_prepare.unset(r);

				    cf::get_cas_propose.unset(r);

				    cf::get_cas_commit.unset(r);

				    cf::get_sstables_per_read_histogram.unset(r);

				    cf::get_tombstone_scanned_histogram.unset(r);

				    cf::get_live_scanned_histogram.unset(r);

				    cf::get_col_update_time_delta_histogram.unset(r);

				    cf::get_auto_compaction.unset(r);

				    cf::enable_auto_compaction.unset(r);

				    cf::disable_auto_compaction.unset(r);

				    cf::get_built_indexes.unset(r);

				    cf::get_compression_metadata_off_heap_memory_used.unset(r);

				    cf::get_compression_parameters.unset(r);

				    cf::get_compression_ratio.unset(r);

				    cf::get_read_latency_estimated_histogram.unset(r);

				    cf::get_write_latency_estimated_histogram.unset(r);

				    cf::set_compaction_strategy_class.unset(r);

				    cf::get_compaction_strategy_class.unset(r);

				    cf::set_compression_parameters.unset(r);

				    cf::set_crc_check_chance.unset(r);

				    cf::get_sstable_count_per_level.unset(r);

				    cf::get_sstables_for_key.unset(r);

				    cf::toppartitions.unset(r);

				    cf::force_major_compaction.unset(r);

				}

				}

									
										11

api/column_family.hh
									
												View File
												
				@@ -14,11 +14,16 @@

				#include <seastar/core/future-util.hh>

				#include <any>

				namespace db {

				class system_keyspace;

				}

				namespace api {

				void set_column_family(http_context& ctx, routes& r);

				void set_column_family(http_context& ctx, httpd::routes& r, sharded<db::system_keyspace>& sys_ks);

				void unset_column_family(http_context& ctx, httpd::routes& r);

				const utils::UUID& get_uuid(const sstring& name, const replica::database& db);

				const table_id& get_uuid(const sstring& name, const replica::database& db);

				future<> foreach_column_family(http_context& ctx, const sstring& name, std::function<void(replica::column_family&)> f);

				@@ -63,7 +68,7 @@ struct map_reduce_column_families_locally {

				    std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)> reducer;

				    future<std::unique_ptr<std::any>> operator()(replica::database& db) const {

				        auto res = seastar::make_lw_shared<std::unique_ptr<std::any>>(std::make_unique<std::any>(init));

				        return do_for_each(db.get_column_families(), [res, this](const std::pair<utils::UUID, seastar::lw_shared_ptr<replica::table>>& i) {

				        return do_for_each(db.get_column_families(), [res, this](const std::pair<table_id, seastar::lw_shared_ptr<replica::table>>& i) {

				            *res = reducer(std::move(*res), mapper(*i.second.get()));

				        }).then([res] {

				            return std::move(*res);

									
										1

api/commitlog.cc
									
												View File
												
				@@ -13,6 +13,7 @@

				#include <vector>

				namespace api {

				using namespace seastar::httpd;

				template<typename T>

				static auto acquire_cl_metric(http_context& ctx, std::function<T (db::commitlog*)> func) {

									
										2

api/commitlog.hh
									
												View File
												
				@@ -12,6 +12,6 @@

				namespace api {

				void set_commitlog(http_context& ctx, routes& r);

				void set_commitlog(http_context& ctx, httpd::routes& r);

				}

									
										46

api/compaction_manager.cc
									
												View File
												
				@@ -22,6 +22,7 @@ namespace api {

				namespace cm = httpd::compaction_manager_json;

				using namespace json;

				using namespace seastar::httpd;

				static future<json::json_return_type> get_cm_stats(http_context& ctx,

				        int64_t compaction_manager::stats::*f) {

				@@ -41,9 +42,8 @@ static std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_ha

				    return std::move(a);

				}

				void set_compaction_manager(http_context& ctx, routes& r) {

				    cm::get_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cm::get_compactions.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return ctx.db.map_reduce0([](replica::database& db) {

				            std::vector<cm::summary> summaries;

				            const compaction_manager& cm = db.get_compaction_manager();

				@@ -65,12 +65,12 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				        });

				    });

				    cm::get_pending_tasks_by_table.set(r, [&ctx] (std::unique_ptr<request> req) {

				        return ctx.db.map_reduce0([&ctx](replica::database& db) {

				            return do_with(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), [&ctx, &db](std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& tasks) {

				                return do_for_each(db.get_column_families(), [&tasks](const std::pair<utils::UUID, seastar::lw_shared_ptr<replica::table>>& i) {

				    cm::get_pending_tasks_by_table.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return ctx.db.map_reduce0([](replica::database& db) {

				            return do_with(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), [&db](std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& tasks) {

				                return do_for_each(db.get_column_families(), [&tasks](const std::pair<table_id, seastar::lw_shared_ptr<replica::table>>& i) -> future<> {

				                    replica::table& cf = *i.second.get();

				                    tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.get_compaction_strategy().estimated_pending_compactions(cf.as_table_state());

				                    tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.estimate_pending_compactions();

				                    return make_ready_future<>();

				                }).then([&tasks] {

				                    return std::move(tasks);

				@@ -91,14 +91,14 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				        });

				    });

				    cm::force_user_defined_compaction.set(r, [] (std::unique_ptr<request> req) {

				    cm::force_user_defined_compaction.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        // FIXME

				        warn(unimplemented::cause::API);

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    cm::stop_compaction.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cm::stop_compaction.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        auto type = req->get_query_param("type");

				        return ctx.db.invoke_on_all([type] (replica::database& db) {

				            auto& cm = db.get_compaction_manager();

				@@ -108,7 +108,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				        });

				    });

				    cm::stop_keyspace_compaction.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {

				    cm::stop_keyspace_compaction.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto ks_name = validate_keyspace(ctx, req->param);

				        auto table_names = parse_tables(ks_name, ctx, req->query_parameters, "tables");

				        if (table_names.empty()) {

				@@ -119,41 +119,43 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				            auto& cm = db.get_compaction_manager();

				            return parallel_for_each(table_names, [&db, &cm, &ks_name, type] (sstring& table_name) {

				                auto& t = db.find_column_family(ks_name, table_name);

				                return cm.stop_compaction(type, &t.as_table_state());

				                return t.parallel_foreach_table_state([&] (compaction::table_state& ts) {

				                    return cm.stop_compaction(type, &ts);

				                });

				            });

				        });

				        co_return json_void();

				    });

				    cm::get_pending_tasks.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cm::get_pending_tasks.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {

				            return cf.get_compaction_strategy().estimated_pending_compactions(cf.as_table_state());

				            return cf.estimate_pending_compactions();

				        }, std::plus<int64_t>());

				    });

				    cm::get_completed_tasks.set(r, [&ctx] (std::unique_ptr<request> req) {

				    cm::get_completed_tasks.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cm_stats(ctx, &compaction_manager::stats::completed_tasks);

				    });

				    cm::get_total_compactions_completed.set(r, [] (std::unique_ptr<request> req) {

				    cm::get_total_compactions_completed.set(r, [] (std::unique_ptr<http::request> req) {

				        // FIXME

				        // We are currently dont have an API for compaction

				        // so returning a 0 as the number of total compaction is ok

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cm::get_bytes_compacted.set(r, [] (std::unique_ptr<request> req) {

				    cm::get_bytes_compacted.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        // FIXME

				        warn(unimplemented::cause::API);

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cm::get_compaction_history.set(r, [] (std::unique_ptr<request> req) {

				        std::function<future<>(output_stream<char>&&)> f = [](output_stream<char>&& s) {

				            return do_with(output_stream<char>(std::move(s)), true, [] (output_stream<char>& s, bool& first){

				                return s.write("[").then([&s, &first] {

				                    return db::system_keyspace::get_compaction_history([&s, &first](const db::system_keyspace::compaction_history_entry& entry) mutable {

				    cm::get_compaction_history.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        std::function<future<>(output_stream<char>&&)> f = [&ctx](output_stream<char>&& s) {

				            return do_with(output_stream<char>(std::move(s)), true, [&ctx] (output_stream<char>& s, bool& first){

				                return s.write("[").then([&ctx, &s, &first] {

				                    return ctx.db.local().get_compaction_manager().get_compaction_history([&s, &first](const db::compaction_history_entry& entry) mutable {

				                        cm::history h;

				                        h.id = entry.id.to_sstring();

				                        h.ks = std::move(entry.ks);

				@@ -183,7 +185,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(std::move(f));

				    });

				    cm::get_compaction_info.set(r, [] (std::unique_ptr<request> req) {

				    cm::get_compaction_info.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        // FIXME

				        warn(unimplemented::cause::API);

									
										2

api/compaction_manager.hh
									
												View File
												
				@@ -12,6 +12,6 @@

				namespace api {

				void set_compaction_manager(http_context& ctx, routes& r);

				void set_compaction_manager(http_context& ctx, httpd::routes& r);

				}

									
										1

api/config.cc
									
												View File
												
				@@ -13,6 +13,7 @@

				#include <boost/algorithm/string/replace.hpp>

				namespace api {

				using namespace seastar::httpd;

				template<class T>

				json::json_return_type get_json_return_type(const T& val) {

									
										2

api/config.hh
									
												View File
												
				@@ -13,5 +13,5 @@

				namespace api {

				void set_config(std::shared_ptr<api_registry_builder20> rb, http_context& ctx, routes& r, const db::config& cfg);

				void set_config(std::shared_ptr<httpd::api_registry_builder20> rb, http_context& ctx, httpd::routes& r, const db::config& cfg);

				}

									
										41

api/endpoint_snitch.cc
									
												View File
												
				@@ -8,13 +8,16 @@

				#include "locator/token_metadata.hh"

				#include "locator/snitch_base.hh"

				#include "locator/production_snitch_base.hh"

				#include "endpoint_snitch.hh"

				#include "api/api-doc/endpoint_snitch_info.json.hh"

				#include "api/api-doc/storage_service.json.hh"

				#include "utils/fb_utilities.hh"

				namespace api {

				using namespace seastar::httpd;

				void set_endpoint_snitch(http_context& ctx, routes& r) {

				void set_endpoint_snitch(http_context& ctx, routes& r, sharded<locator::snitch_ptr>& snitch) {

				    static auto host_or_broadcast = [](const_req req) {

				        auto host = req.get_query_param("host");

				        return host.empty() ? gms::inet_address(utils::fb_utilities::get_broadcast_address()) : gms::inet_address(host);

				@@ -22,17 +25,45 @@ void set_endpoint_snitch(http_context& ctx, routes& r) {

				    httpd::endpoint_snitch_info_json::get_datacenter.set(r, [&ctx](const_req req) {

				        auto& topology = ctx.shared_token_metadata.local().get()->get_topology();

				        return topology.get_datacenter(host_or_broadcast(req));

				        auto ep = host_or_broadcast(req);

				        if (!topology.has_endpoint(ep)) {

				            // Cannot return error here, nodetool status can race, request

				            // info about just-left node and not handle it nicely

				            return locator::endpoint_dc_rack::default_location.dc;

				        }

				        return topology.get_datacenter(ep);

				    });

				    httpd::endpoint_snitch_info_json::get_rack.set(r, [&ctx](const_req req) {

				        auto& topology = ctx.shared_token_metadata.local().get()->get_topology();

				        return topology.get_rack(host_or_broadcast(req));

				        auto ep = host_or_broadcast(req);

				        if (!topology.has_endpoint(ep)) {

				            // Cannot return error here, nodetool status can race, request

				            // info about just-left node and not handle it nicely

				            return locator::endpoint_dc_rack::default_location.rack;

				        }

				        return topology.get_rack(ep);

				    });

				    httpd::endpoint_snitch_info_json::get_snitch_name.set(r, [] (const_req req) {

				        return locator::i_endpoint_snitch::get_local_snitch_ptr()->get_name();

				    httpd::endpoint_snitch_info_json::get_snitch_name.set(r, [&snitch] (const_req req) {

				        return snitch.local()->get_name();

				    });

				    httpd::storage_service_json::update_snitch.set(r, [&snitch](std::unique_ptr<request> req) {

				        locator::snitch_config cfg;

				        cfg.name = req->get_query_param("ep_snitch_class_name");

				        return locator::i_endpoint_snitch::reset_snitch(snitch, cfg).then([] {

				            return make_ready_future<json::json_return_type>(json::json_void());

				        });

				    });

				}

				void unset_endpoint_snitch(http_context& ctx, routes& r) {

				    httpd::endpoint_snitch_info_json::get_datacenter.unset(r);

				    httpd::endpoint_snitch_info_json::get_rack.unset(r);

				    httpd::endpoint_snitch_info_json::get_snitch_name.unset(r);

				    httpd::storage_service_json::update_snitch.unset(r);

				}

				}

									
										7

api/endpoint_snitch.hh
									
												View File
												
				@@ -10,8 +10,13 @@

				#include "api.hh"

				namespace locator {

				class snitch_ptr;

				}

				namespace api {

				void set_endpoint_snitch(http_context& ctx, routes& r);

				void set_endpoint_snitch(http_context& ctx, httpd::routes& r, sharded<locator::snitch_ptr>&);

				void unset_endpoint_snitch(http_context& ctx, httpd::routes& r);

				}

									
										1

api/error_injection.cc
									
												View File
												
				@@ -15,6 +15,7 @@

				#include <seastar/core/future-util.hh>

				namespace api {

				using namespace seastar::httpd;

				namespace hf = httpd::error_injection_json;

									
										2

api/error_injection.hh
									
												View File
												
				@@ -12,6 +12,6 @@

				namespace api {

				void set_error_injection(http_context& ctx, routes& r);

				void set_error_injection(http_context& ctx, httpd::routes& r);

				}

									
										27

api/failure_detector.cc
									
												View File
												
				@@ -8,10 +8,11 @@

				#include "failure_detector.hh"

				#include "api/api-doc/failure_detector.json.hh"

				#include "gms/failure_detector.hh"

				#include "gms/application_state.hh"

				#include "gms/gossiper.hh"

				namespace api {

				using namespace seastar::httpd;

				namespace fd = httpd::failure_detector_json;

				@@ -20,18 +21,18 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {

				        std::vector<fd::endpoint_state> res;

				        for (auto i : g.get_endpoint_states()) {

				            fd::endpoint_state val;

				            val.addrs = boost::lexical_cast<std::string>(i.first);

				            val.addrs = fmt::to_string(i.first);

				            val.is_alive = i.second.is_alive();

				            val.generation = i.second.get_heart_beat_state().get_generation();

				            val.version = i.second.get_heart_beat_state().get_heart_beat_version();

				            val.generation = i.second.get_heart_beat_state().get_generation().value();

				            val.version = i.second.get_heart_beat_state().get_heart_beat_version().value();

				            val.update_time = i.second.get_update_timestamp().time_since_epoch().count();

				            for (auto a : i.second.get_application_state_map()) {

				                fd::version_value version_val;

				                // We return the enum index and not it's name to stay compatible to origin

				                // method that the state index are static but the name can be changed.

				                version_val.application_state = static_cast<std::underlying_type<gms::application_state>::type>(a.first);

				                version_val.value = a.second.value;

				                version_val.version = a.second.version;

				                version_val.value = a.second.value();

				                version_val.version = a.second.version().value();

				                val.application_state.push(version_val);

				            }

				            res.push_back(val);

				@@ -62,7 +63,9 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {

				    });

				    fd::set_phi_convict_threshold.set(r, [](std::unique_ptr<request> req) {

				        double phi = atof(req->get_query_param("phi").c_str());

				        // TBD

				        unimplemented();

				        std::ignore = atof(req->get_query_param("phi").c_str());

				        return make_ready_future<json::json_return_type>("");

				    });

				@@ -77,15 +80,9 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {

				    });

				    fd::get_endpoint_phi_values.set(r, [](std::unique_ptr<request> req) {

				        std::map<gms::inet_address, gms::arrival_window> map;

				        // We no longer have a phi failure detector,

				        // just returning the empty value is good enough.

				        std::vector<fd::endpoint_phi_value> res;

				        auto now = gms::arrival_window::clk::now();

				        for (auto& p : map) {

				            fd::endpoint_phi_value val;

				            val.endpoint = p.first.to_sstring();

				            val.phi = p.second.phi(now);

				            res.emplace_back(std::move(val));

				        }

				        return make_ready_future<json::json_return_type>(res);

				    });

				}

									
										2

api/failure_detector.hh
									
												View File
												
				@@ -18,6 +18,6 @@ class gossiper;

				namespace api {

				void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g);

				void set_failure_detector(http_context& ctx, httpd::routes& r, gms::gossiper& g);

				}

									
										25

api/gossiper.cc
									
												View File
												
				@@ -11,6 +11,7 @@

				#include "gms/gossiper.hh"

				namespace api {

				using namespace seastar::httpd;

				using namespace json;

				void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g) {

				@@ -19,9 +20,11 @@ void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g) {

				        return container_to_vec(res);

				    });

				    httpd::gossiper_json::get_live_endpoint.set(r, [&g] (const_req req) {

				        auto res = g.get_live_members();

				        return container_to_vec(res);

				    httpd::gossiper_json::get_live_endpoint.set(r, [&g] (std::unique_ptr<request> req) {

				        return g.get_live_members_synchronized().then([] (auto res) {

				            return make_ready_future<json::json_return_type>(container_to_vec(res));

				        });

				    });

				    httpd::gossiper_json::get_endpoint_downtime.set(r, [&g] (const_req req) {

				@@ -29,21 +32,21 @@ void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g) {

				        return g.get_endpoint_downtime(ep);

				    });

				    httpd::gossiper_json::get_current_generation_number.set(r, [&g] (std::unique_ptr<request> req) {

				    httpd::gossiper_json::get_current_generation_number.set(r, [&g] (std::unique_ptr<http::request> req) {

				        gms::inet_address ep(req->param["addr"]);

				        return g.get_current_generation_number(ep).then([] (int res) {

				            return make_ready_future<json::json_return_type>(res);

				        return g.get_current_generation_number(ep).then([] (gms::generation_type res) {

				            return make_ready_future<json::json_return_type>(res.value());

				        });

				    });

				    httpd::gossiper_json::get_current_heart_beat_version.set(r, [&g] (std::unique_ptr<request> req) {

				    httpd::gossiper_json::get_current_heart_beat_version.set(r, [&g] (std::unique_ptr<http::request> req) {

				        gms::inet_address ep(req->param["addr"]);

				        return g.get_current_heart_beat_version(ep).then([] (int res) {

				            return make_ready_future<json::json_return_type>(res);

				        return g.get_current_heart_beat_version(ep).then([] (gms::version_type res) {

				            return make_ready_future<json::json_return_type>(res.value());

				        });

				    });

				    httpd::gossiper_json::assassinate_endpoint.set(r, [&g](std::unique_ptr<request> req) {

				    httpd::gossiper_json::assassinate_endpoint.set(r, [&g](std::unique_ptr<http::request> req) {

				        if (req->get_query_param("unsafe") != "True") {

				            return g.assassinate_endpoint(req->param["addr"]).then([] {

				                return make_ready_future<json::json_return_type>(json_void());

				@@ -54,7 +57,7 @@ void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g) {

				        });

				    });

				    httpd::gossiper_json::force_remove_endpoint.set(r, [&g](std::unique_ptr<request> req) {

				    httpd::gossiper_json::force_remove_endpoint.set(r, [&g](std::unique_ptr<http::request> req) {

				        gms::inet_address ep(req->param["addr"]);

				        return g.force_remove_endpoint(ep).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

									
										2

api/gossiper.hh
									
												View File
												
				@@ -18,6 +18,6 @@ class gossiper;

				namespace api {

				void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g);

				void set_gossiper(http_context& ctx, httpd::routes& r, gms::gossiper& g);

				}

									
										17

api/hinted_handoff.cc
									
												View File
												
				@@ -19,10 +19,11 @@

				namespace api {

				using namespace json;

				using namespace seastar::httpd;

				namespace hh = httpd::hinted_handoff_json;

				void set_hinted_handoff(http_context& ctx, routes& r, gms::gossiper& g) {

				    hh::create_hints_sync_point.set(r, [&ctx, &g] (std::unique_ptr<request> req) -> future<json::json_return_type> {

				    hh::create_hints_sync_point.set(r, [&ctx, &g] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto parse_hosts_list = [&g] (sstring arg) {

				            std::vector<sstring> hosts_str = split(arg, ",");

				            std::vector<gms::inet_address> hosts;

				@@ -52,7 +53,7 @@ void set_hinted_handoff(http_context& ctx, routes& r, gms::gossiper& g) {

				        });

				    });

				    hh::get_hints_sync_point.set(r, [&ctx] (std::unique_ptr<request> req) -> future<json::json_return_type> {

				    hh::get_hints_sync_point.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        db::hints::sync_point sync_point;

				        const sstring encoded = req->get_query_param("id");

				        try {

				@@ -93,42 +94,42 @@ void set_hinted_handoff(http_context& ctx, routes& r, gms::gossiper& g) {

				        });

				    });

				    hh::list_endpoints_pending_hints.set(r, [] (std::unique_ptr<request> req) {

				    hh::list_endpoints_pending_hints.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        std::vector<sstring> res;

				        return make_ready_future<json::json_return_type>(res);

				    });

				    hh::truncate_all_hints.set(r, [] (std::unique_ptr<request> req) {

				    hh::truncate_all_hints.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        sstring host = req->get_query_param("host");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    hh::schedule_hint_delivery.set(r, [] (std::unique_ptr<request> req) {

				    hh::schedule_hint_delivery.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        sstring host = req->get_query_param("host");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    hh::pause_hints_delivery.set(r, [] (std::unique_ptr<request> req) {

				    hh::pause_hints_delivery.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        sstring pause = req->get_query_param("pause");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    hh::get_create_hint_count.set(r, [] (std::unique_ptr<request> req) {

				    hh::get_create_hint_count.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        sstring host = req->get_query_param("host");

				        return make_ready_future<json::json_return_type>(0);

				    });

				    hh::get_not_stored_hints_count.set(r, [] (std::unique_ptr<request> req) {

				    hh::get_not_stored_hints_count.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        sstring host = req->get_query_param("host");

									
										4

api/hinted_handoff.hh
									
												View File
												
				@@ -18,7 +18,7 @@ class gossiper;

				namespace api {

				void set_hinted_handoff(http_context& ctx, routes& r, gms::gossiper& g);

				void unset_hinted_handoff(http_context& ctx, routes& r);

				void set_hinted_handoff(http_context& ctx, httpd::routes& r, gms::gossiper& g);

				void unset_hinted_handoff(http_context& ctx, httpd::routes& r);

				}

									
										1

api/lsa.cc
									
												View File
												
				@@ -16,6 +16,7 @@

				#include "replica/database.hh"

				namespace api {

				using namespace seastar::httpd;

				static logging::logger alogger("lsa-api");

									
										2

api/lsa.hh
									
												View File
												
				@@ -12,6 +12,6 @@

				namespace api {

				void set_lsa(http_context& ctx, routes& r);

				void set_lsa(http_context& ctx, httpd::routes& r);

				}

									
										3

api/messaging_service.cc
									
												View File
												
				@@ -13,6 +13,7 @@

				#include <iostream>

				#include <sstream>

				using namespace seastar::httpd;

				using namespace httpd::messaging_service_json;

				using namespace netw;

				@@ -28,7 +29,7 @@ std::vector<message_counter> map_to_message_counters(

				    std::vector<message_counter> res;

				    for (auto i : map) {

				        res.push_back(message_counter());

				        res.back().key = boost::lexical_cast<sstring>(i.first);

				        res.back().key = fmt::to_string(i.first);

				        res.back().value = i.second;

				    }

				    return res;

									
										4

api/messaging_service.hh
									
												View File
												
				@@ -14,7 +14,7 @@ namespace netw { class messaging_service; }

				namespace api {

				void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging_service>& ms);

				void unset_messaging_service(http_context& ctx, routes& r);

				void set_messaging_service(http_context& ctx, httpd::routes& r, sharded<netw::messaging_service>& ms);

				void unset_messaging_service(http_context& ctx, httpd::routes& r);

				}

									
										223

api/storage_proxy.cc
									
												View File
												
				@@ -20,8 +20,12 @@ namespace api {

				namespace sp = httpd::storage_proxy_json;

				using proxy = service::storage_proxy;

				using namespace seastar::httpd;

				using namespace json;

				utils::time_estimated_histogram timed_rate_moving_average_summary_merge(utils::time_estimated_histogram a, const utils::timed_rate_moving_average_summary_and_histogram& b) {

				    return a.merge(b.histogram());

				}

				/**

				 * This function implement a two dimentional map reduce where

				@@ -55,10 +59,10 @@ future<V> two_dimensional_map_reduce(distributed<service::storage_proxy>& d,

				 * @param initial_value - the initial value to use for both aggregations* @return

				 * @return A future that resolves to the result of the aggregation.

				 */

				template<typename V, typename Reducer, typename F>

				template<typename V, typename Reducer, typename F, typename C>

				future<V> two_dimensional_map_reduce(distributed<service::storage_proxy>& d,

				        V F::*f, Reducer reducer, V initial_value) {

				    return two_dimensional_map_reduce(d, [f] (F& stats) {

				        C F::*f, Reducer reducer, V initial_value) {

				    return two_dimensional_map_reduce(d, [f] (F& stats) -> V {

				        return stats.*f;

				    }, reducer, initial_value);

				}

				@@ -112,10 +116,10 @@ utils_json::estimated_histogram time_to_json_histogram(const utils::time_estimat

				    return res;

				}

				static future<json::json_return_type>  sum_estimated_histogram(http_context& ctx, utils::time_estimated_histogram service::storage_proxy_stats::stats::*f) {

				    return two_dimensional_map_reduce(ctx.sp, f, utils::time_estimated_histogram_merge,

				            utils::time_estimated_histogram()).then([](const utils::time_estimated_histogram& val) {

				static future<json::json_return_type>  sum_estimated_histogram(http_context& ctx, utils::timed_rate_moving_average_summary_and_histogram service::storage_proxy_stats::stats::*f) {

				    return two_dimensional_map_reduce(ctx.sp, [f] (service::storage_proxy_stats::stats& stats) {

				        return (stats.*f).histogram();

				    }, utils::time_estimated_histogram_merge, utils::time_estimated_histogram()).then([](const utils::time_estimated_histogram& val) {

				        return make_ready_future<json::json_return_type>(time_to_json_histogram(val));

				    });

				}

				@@ -130,7 +134,7 @@ static future<json::json_return_type>  sum_estimated_histogram(http_context& ctx

				    });

				}

				static future<json::json_return_type>  total_latency(http_context& ctx, utils::timed_rate_moving_average_and_histogram service::storage_proxy_stats::stats::*f) {

				static future<json::json_return_type>  total_latency(http_context& ctx, utils::timed_rate_moving_average_summary_and_histogram service::storage_proxy_stats::stats::*f) {

				    return two_dimensional_map_reduce(ctx.sp, [f] (service::storage_proxy_stats::stats& stats) {

				            return (stats.*f).hist.mean * (stats.*f).hist.count;

				        }, std::plus<double>(), 0.0).then([](double val) {

				@@ -150,7 +154,7 @@ static future<json::json_return_type>  total_latency(http_context& ctx, utils::t

				template<typename F>

				future<json::json_return_type>

				sum_histogram_stats_storage_proxy(distributed<proxy>& d,

				        utils::timed_rate_moving_average_and_histogram F::*f) {

				        utils::timed_rate_moving_average_summary_and_histogram F::*f) {

				    return two_dimensional_map_reduce(d, [f] (service::storage_proxy_stats::stats& stats) {

				        return (stats.*f).hist;

				    }, std::plus<utils::ihistogram>(), utils::ihistogram()).

				@@ -170,7 +174,7 @@ sum_histogram_stats_storage_proxy(distributed<proxy>& d,

				template<typename F>

				future<json::json_return_type>

				sum_timer_stats_storage_proxy(distributed<proxy>& d,

				        utils::timed_rate_moving_average_and_histogram F::*f) {

				        utils::timed_rate_moving_average_summary_and_histogram F::*f) {

				    return two_dimensional_map_reduce(d, [f] (service::storage_proxy_stats::stats& stats) {

				        return (stats.*f).rate();

				@@ -181,75 +185,75 @@ sum_timer_stats_storage_proxy(distributed<proxy>& d,

				}

				void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_service>& ss) {

				    sp::get_total_hints.set(r, [](std::unique_ptr<request> req)  {

				    sp::get_total_hints.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    });

				    sp::get_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<request> req)  {

				        const auto& filter = service::get_storage_proxy().local().get_hints_host_filter();

				    sp::get_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<http::request> req)  {

				        const auto& filter = ctx.sp.local().get_hints_host_filter();

				        return make_ready_future<json::json_return_type>(!filter.is_disabled_for_all());

				    });

				    sp::set_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req)  {

				    sp::set_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<http::request> req)  {

				        auto enable = req->get_query_param("enable");

				        auto filter = (enable == "true" || enable == "1")

				                ? db::hints::host_filter(db::hints::host_filter::enabled_for_all_tag {})

				                : db::hints::host_filter(db::hints::host_filter::disabled_for_all_tag {});

				        return service::get_storage_proxy().invoke_on_all([filter = std::move(filter)] (service::storage_proxy& sp) {

				        return ctx.sp.invoke_on_all([filter = std::move(filter)] (service::storage_proxy& sp) {

				            return sp.change_hints_host_filter(filter);

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    sp::get_hinted_handoff_enabled_by_dc.set(r, [](std::unique_ptr<request> req)  {

				    sp::get_hinted_handoff_enabled_by_dc.set(r, [&ctx](std::unique_ptr<http::request> req)  {

				        std::vector<sstring> res;

				        const auto& filter = service::get_storage_proxy().local().get_hints_host_filter();

				        const auto& filter = ctx.sp.local().get_hints_host_filter();

				        const auto& dcs = filter.get_dcs();

				        res.reserve(res.size());

				        std::copy(dcs.begin(), dcs.end(), std::back_inserter(res));

				        return make_ready_future<json::json_return_type>(res);

				    });

				    sp::set_hinted_handoff_enabled_by_dc_list.set(r, [](std::unique_ptr<request> req)  {

				    sp::set_hinted_handoff_enabled_by_dc_list.set(r, [&ctx](std::unique_ptr<http::request> req)  {

				        auto dcs = req->get_query_param("dcs");

				        auto filter = db::hints::host_filter::parse_from_dc_list(std::move(dcs));

				        return service::get_storage_proxy().invoke_on_all([filter = std::move(filter)] (service::storage_proxy& sp) {

				        return ctx.sp.invoke_on_all([filter = std::move(filter)] (service::storage_proxy& sp) {

				            return sp.change_hints_host_filter(filter);

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    sp::get_max_hint_window.set(r, [](std::unique_ptr<request> req)  {

				    sp::get_max_hint_window.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    });

				    sp::set_max_hint_window.set(r, [](std::unique_ptr<request> req)  {

				    sp::set_max_hint_window.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("ms");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::get_max_hints_in_progress.set(r, [](std::unique_ptr<request> req)  {

				    sp::get_max_hints_in_progress.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(1);

				    });

				    sp::set_max_hints_in_progress.set(r, [](std::unique_ptr<request> req)  {

				    sp::set_max_hints_in_progress.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("qs");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::get_hints_in_progress.set(r, [](std::unique_ptr<request> req)  {

				    sp::get_hints_in_progress.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				@@ -259,7 +263,7 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se

				        return ctx.db.local().get_config().request_timeout_in_ms()/1000.0;

				    });

				    sp::set_rpc_timeout.set(r, [](std::unique_ptr<request> req)  {

				    sp::set_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				@@ -270,7 +274,7 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se

				        return ctx.db.local().get_config().read_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_read_rpc_timeout.set(r, [](std::unique_ptr<request> req)  {

				    sp::set_read_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				@@ -281,7 +285,7 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se

				        return ctx.db.local().get_config().write_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_write_rpc_timeout.set(r, [](std::unique_ptr<request> req)  {

				    sp::set_write_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				@@ -292,7 +296,7 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se

				        return ctx.db.local().get_config().counter_write_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_counter_write_rpc_timeout.set(r, [](std::unique_ptr<request> req)  {

				    sp::set_counter_write_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				@@ -303,7 +307,7 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se

				        return ctx.db.local().get_config().cas_contention_timeout_in_ms()/1000.0;

				    });

				    sp::set_cas_contention_timeout.set(r, [](std::unique_ptr<request> req)  {

				    sp::set_cas_contention_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				@@ -314,7 +318,7 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se

				        return ctx.db.local().get_config().range_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_range_rpc_timeout.set(r, [](std::unique_ptr<request> req)  {

				    sp::set_range_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				@@ -325,32 +329,32 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se

				        return ctx.db.local().get_config().truncate_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_truncate_rpc_timeout.set(r, [](std::unique_ptr<request> req)  {

				    sp::set_truncate_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::reload_trigger_classes.set(r, [](std::unique_ptr<request> req)  {

				    sp::reload_trigger_classes.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::get_read_repair_attempted.set(r, [&ctx](std::unique_ptr<request> req)  {

				    sp::get_read_repair_attempted.set(r, [&ctx](std::unique_ptr<http::request> req)  {

				        return sum_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read_repair_attempts);

				    });

				    sp::get_read_repair_repaired_blocking.set(r, [&ctx](std::unique_ptr<request> req)  {

				    sp::get_read_repair_repaired_blocking.set(r, [&ctx](std::unique_ptr<http::request> req)  {

				        return sum_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read_repair_repaired_blocking);

				    });

				    sp::get_read_repair_repaired_background.set(r, [&ctx](std::unique_ptr<request> req)  {

				    sp::get_read_repair_repaired_background.set(r, [&ctx](std::unique_ptr<http::request> req)  {

				        return sum_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read_repair_repaired_background);

				    });

				    sp::get_schema_versions.set(r, [&ss](std::unique_ptr<request> req)  {

				    sp::get_schema_versions.set(r, [&ss](std::unique_ptr<http::request> req)  {

				        return ss.local().describe_schema_versions().then([] (auto result) {

				            std::vector<sp::mapper_list> res;

				            for (auto e : result) {

				@@ -363,122 +367,122 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se

				        });

				    });

				    sp::get_cas_read_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_cas_read_timeouts.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_timeouts);

				    });

				    sp::get_cas_read_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_cas_read_unavailables.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_unavailables);

				    });

				    sp::get_cas_write_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_cas_write_timeouts.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_timeouts);

				    });

				    sp::get_cas_write_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_cas_write_unavailables.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_unavailables);

				    });

				    sp::get_cas_write_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_cas_write_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_stats(ctx.sp, &proxy::stats::cas_write_unfinished_commit);

				    });

				    sp::get_cas_write_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_cas_write_metrics_contention.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_estimated_histogram(ctx, &proxy::stats::cas_write_contention);

				    });

				    sp::get_cas_write_metrics_condition_not_met.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_cas_write_metrics_condition_not_met.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_stats(ctx.sp, &proxy::stats::cas_write_condition_not_met);

				    });

				    sp::get_cas_write_metrics_failed_read_round_optimization.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_cas_write_metrics_failed_read_round_optimization.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_stats(ctx.sp, &proxy::stats::cas_failed_read_round_optimization);

				    });

				    sp::get_cas_read_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_cas_read_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_stats(ctx.sp, &proxy::stats::cas_read_unfinished_commit);

				    });

				    sp::get_cas_read_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_cas_read_metrics_contention.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_estimated_histogram(ctx, &proxy::stats::cas_read_contention);

				    });

				    sp::get_read_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_read_metrics_timeouts.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::read_timeouts);

				    });

				    sp::get_read_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_read_metrics_unavailables.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::read_unavailables);

				    });

				    sp::get_range_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_range_metrics_timeouts.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::range_slice_timeouts);

				    });

				    sp::get_range_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_range_metrics_unavailables.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::range_slice_unavailables);

				    });

				    sp::get_write_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_write_metrics_timeouts.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::write_timeouts);

				    });

				    sp::get_write_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_write_metrics_unavailables.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timed_rate_as_long(ctx.sp, &service::storage_proxy_stats::stats::write_unavailables);

				    });

				    sp::get_read_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_read_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::read_timeouts);

				    });

				    sp::get_read_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_read_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::read_unavailables);

				    });

				    sp::get_range_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_range_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::range_slice_timeouts);

				    });

				    sp::get_range_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_range_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::range_slice_unavailables);

				    });

				    sp::get_write_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_write_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::write_timeouts);

				    });

				    sp::get_write_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_write_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timed_rate_as_obj(ctx.sp, &service::storage_proxy_stats::stats::write_unavailables);

				    });

				    sp::get_range_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_range_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_histogram_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::range);

				    });

				    sp::get_write_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_write_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_histogram_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::write);

				    });

				    sp::get_read_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_read_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_histogram_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read);

				    });

				    sp::get_range_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_range_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::range);

				    });

				    sp::get_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::write);

				    });

				    sp::get_cas_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_cas_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::cas_write);

				    });

				    sp::get_cas_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_cas_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timer_stats(ctx.sp, &proxy::stats::cas_read);

				    });

				    sp::get_view_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_view_write_metrics_latency_histogram.set(r, [](std::unique_ptr<http::request> req) {

				        //TBD

				        // FIXME

				        // No View metrics are available, so just return empty moving average

				@@ -486,32 +490,101 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_se

				        return make_ready_future<json::json_return_type>(get_empty_moving_average());

				    });

				    sp::get_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::read);

				    });

				    sp::get_read_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_estimated_histogram(ctx, &service::storage_proxy_stats::stats::estimated_read);

				    sp::get_read_estimated_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_estimated_histogram(ctx, &service::storage_proxy_stats::stats::read);

				    });

				    sp::get_read_latency.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_read_latency.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return total_latency(ctx, &service::storage_proxy_stats::stats::read);

				    });

				    sp::get_write_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				        return sum_estimated_histogram(ctx, &service::storage_proxy_stats::stats::estimated_write);

				    sp::get_write_estimated_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_estimated_histogram(ctx, &service::storage_proxy_stats::stats::write);

				    });

				    sp::get_write_latency.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_write_latency.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return total_latency(ctx, &service::storage_proxy_stats::stats::write);

				    });

				    sp::get_range_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_range_estimated_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return sum_timer_stats_storage_proxy(ctx.sp, &service::storage_proxy_stats::stats::range);

				    });

				    sp::get_range_latency.set(r, [&ctx](std::unique_ptr<request> req) {

				    sp::get_range_latency.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return total_latency(ctx, &service::storage_proxy_stats::stats::range);

				    });

				}

				void unset_storage_proxy(http_context& ctx, routes& r) {

				    sp::get_total_hints.unset(r);

				    sp::get_hinted_handoff_enabled.unset(r);

				    sp::set_hinted_handoff_enabled.unset(r);

				    sp::get_hinted_handoff_enabled_by_dc.unset(r);

				    sp::set_hinted_handoff_enabled_by_dc_list.unset(r);

				    sp::get_max_hint_window.unset(r);

				    sp::set_max_hint_window.unset(r);

				    sp::get_max_hints_in_progress.unset(r);

				    sp::set_max_hints_in_progress.unset(r);

				    sp::get_hints_in_progress.unset(r);

				    sp::get_rpc_timeout.unset(r);

				    sp::set_rpc_timeout.unset(r);

				    sp::get_read_rpc_timeout.unset(r);

				    sp::set_read_rpc_timeout.unset(r);

				    sp::get_write_rpc_timeout.unset(r);

				    sp::set_write_rpc_timeout.unset(r);

				    sp::get_counter_write_rpc_timeout.unset(r);

				    sp::set_counter_write_rpc_timeout.unset(r);

				    sp::get_cas_contention_timeout.unset(r);

				    sp::set_cas_contention_timeout.unset(r);

				    sp::get_range_rpc_timeout.unset(r);

				    sp::set_range_rpc_timeout.unset(r);

				    sp::get_truncate_rpc_timeout.unset(r);

				    sp::set_truncate_rpc_timeout.unset(r);

				    sp::reload_trigger_classes.unset(r);

				    sp::get_read_repair_attempted.unset(r);

				    sp::get_read_repair_repaired_blocking.unset(r);

				    sp::get_read_repair_repaired_background.unset(r);

				    sp::get_schema_versions.unset(r);

				    sp::get_cas_read_timeouts.unset(r);

				    sp::get_cas_read_unavailables.unset(r);

				    sp::get_cas_write_timeouts.unset(r);

				    sp::get_cas_write_unavailables.unset(r);

				    sp::get_cas_write_metrics_unfinished_commit.unset(r);

				    sp::get_cas_write_metrics_contention.unset(r);

				    sp::get_cas_write_metrics_condition_not_met.unset(r);

				    sp::get_cas_write_metrics_failed_read_round_optimization.unset(r);

				    sp::get_cas_read_metrics_unfinished_commit.unset(r);

				    sp::get_cas_read_metrics_contention.unset(r);

				    sp::get_read_metrics_timeouts.unset(r);

				    sp::get_read_metrics_unavailables.unset(r);

				    sp::get_range_metrics_timeouts.unset(r);

				    sp::get_range_metrics_unavailables.unset(r);

				    sp::get_write_metrics_timeouts.unset(r);

				    sp::get_write_metrics_unavailables.unset(r);

				    sp::get_read_metrics_timeouts_rates.unset(r);

				    sp::get_read_metrics_unavailables_rates.unset(r);

				    sp::get_range_metrics_timeouts_rates.unset(r);

				    sp::get_range_metrics_unavailables_rates.unset(r);

				    sp::get_write_metrics_timeouts_rates.unset(r);

				    sp::get_write_metrics_unavailables_rates.unset(r);

				    sp::get_range_metrics_latency_histogram_depricated.unset(r);

				    sp::get_write_metrics_latency_histogram_depricated.unset(r);

				    sp::get_read_metrics_latency_histogram_depricated.unset(r);

				    sp::get_range_metrics_latency_histogram.unset(r);

				    sp::get_write_metrics_latency_histogram.unset(r);

				    sp::get_cas_write_metrics_latency_histogram.unset(r);

				    sp::get_cas_read_metrics_latency_histogram.unset(r);

				    sp::get_view_write_metrics_latency_histogram.unset(r);

				    sp::get_read_metrics_latency_histogram.unset(r);

				    sp::get_read_estimated_histogram.unset(r);

				    sp::get_read_latency.unset(r);

				    sp::get_write_estimated_histogram.unset(r);

				    sp::get_write_latency.unset(r);

				    sp::get_range_estimated_histogram.unset(r);

				    sp::get_range_latency.unset(r);

				}

				}

									
										3

api/storage_proxy.hh
									
												View File
												
				@@ -15,6 +15,7 @@ namespace service { class storage_service; }

				namespace api {

				void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_service>& ss);

				void set_storage_proxy(http_context& ctx, httpd::routes& r, sharded<service::storage_service>& ss);

				void unset_storage_proxy(http_context& ctx, httpd::routes& r);

				}

500

api/storage_service.cc

View File

File diff suppressed because it is too large Load Diff

									
										56

api/storage_service.hh
									
												View File
												
				@@ -8,6 +8,8 @@

				#pragma once

				#include <iostream>

				#include <seastar/core/sharded.hh>

				#include "api.hh"

				#include "db/data_listeners.hh"

				@@ -34,28 +36,52 @@ class gossiper;

				namespace api {

				// verify that the keyspace is found, otherwise a bad_param_exception exception is thrown

				// containing the description of the respective keyspace error.

				sstring validate_keyspace(http_context& ctx, sstring ks_name);

				// verify that the keyspace parameter is found, otherwise a bad_param_exception exception is thrown

				// containing the description of the respective keyspace error.

				sstring validate_keyspace(http_context& ctx, const parameters& param);

				sstring validate_keyspace(http_context& ctx, const httpd::parameters& param);

				// splits a request parameter assumed to hold a comma-separated list of table names

				// verify that the tables are found, otherwise a bad_param_exception exception is thrown

				// containing the description of the respective no_such_column_family error.

				// Returns an empty vector if no parameter was found.

				// If the parameter is found and empty, returns a list of all table names in the keyspace.

				std::vector<sstring> parse_tables(const sstring& ks_name, http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name);

				void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_service>& ss, gms::gossiper& g, sharded<cdc::generation_service>& cdc_gs, sharded<db::system_keyspace>& sys_ls);

				void set_sstables_loader(http_context& ctx, routes& r, sharded<sstables_loader>& sst_loader);

				void unset_sstables_loader(http_context& ctx, routes& r);

				void set_view_builder(http_context& ctx, routes& r, sharded<db::view::view_builder>& vb);

				void unset_view_builder(http_context& ctx, routes& r);

				void set_repair(http_context& ctx, routes& r, sharded<repair_service>& repair);

				void unset_repair(http_context& ctx, routes& r);

				void set_transport_controller(http_context& ctx, routes& r, cql_transport::controller& ctl);

				void unset_transport_controller(http_context& ctx, routes& r);

				void set_rpc_controller(http_context& ctx, routes& r, thrift_controller& ctl);

				void unset_rpc_controller(http_context& ctx, routes& r);

				void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_ctl);

				void unset_snapshot(http_context& ctx, routes& r);

				struct table_info {

				    sstring name;

				    table_id id;

				};

				// splits a request parameter assumed to hold a comma-separated list of table names

				// verify that the tables are found, otherwise a bad_param_exception exception is thrown

				// containing the description of the respective no_such_column_family error.

				// Returns a vector of all table infos given by the parameter, or

				// if the parameter is not found or is empty, returns a list of all table infos in the keyspace.

				std::vector<table_info> parse_table_infos(const sstring& ks_name, http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name);

				void set_storage_service(http_context& ctx, httpd::routes& r, sharded<service::storage_service>& ss, gms::gossiper& g, sharded<cdc::generation_service>& cdc_gs, sharded<db::system_keyspace>& sys_ls);

				void set_sstables_loader(http_context& ctx, httpd::routes& r, sharded<sstables_loader>& sst_loader);

				void unset_sstables_loader(http_context& ctx, httpd::routes& r);

				void set_view_builder(http_context& ctx, httpd::routes& r, sharded<db::view::view_builder>& vb);

				void unset_view_builder(http_context& ctx, httpd::routes& r);

				void set_repair(http_context& ctx, httpd::routes& r, sharded<repair_service>& repair);

				void unset_repair(http_context& ctx, httpd::routes& r);

				void set_transport_controller(http_context& ctx, httpd::routes& r, cql_transport::controller& ctl);

				void unset_transport_controller(http_context& ctx, httpd::routes& r);

				void set_rpc_controller(http_context& ctx, httpd::routes& r, thrift_controller& ctl);

				void unset_rpc_controller(http_context& ctx, httpd::routes& r);

				void set_snapshot(http_context& ctx, httpd::routes& r, sharded<db::snapshot_ctl>& snap_ctl);

				void unset_snapshot(http_context& ctx, httpd::routes& r);

				seastar::future<json::json_return_type> run_toppartitions_query(db::toppartitions_query& q, http_context &ctx, bool legacy_request = false);

				}

				} // namespace api

				namespace std {

				std::ostream& operator<<(std::ostream& os, const api::table_info& ti);

				} // namespace std

									
										7

api/stream_manager.cc
									
												View File
												
				@@ -14,6 +14,7 @@

				#include "gms/gossiper.hh"

				namespace api {

				using namespace seastar::httpd;

				namespace hs = httpd::stream_manager_json;

				@@ -21,7 +22,7 @@ static void set_summaries(const std::vector<streaming::stream_summary>& from,

				        json::json_list<hs::stream_summary>& to) {

				    if (!from.empty()) {

				        hs::stream_summary res;

				        res.cf_id = boost::lexical_cast<std::string>(from.front().cf_id);

				        res.cf_id = fmt::to_string(from.front().cf_id);

				        // For each stream_session, we pretend we are sending/receiving one

				        // file, to make it compatible with nodetool.

				        res.files = 1;

				@@ -38,7 +39,7 @@ static hs::progress_info get_progress_info(const streaming::progress_info& info)

				    res.current_bytes = info.current_bytes;

				    res.direction = info.dir;

				    res.file_name = info.file_name;

				    res.peer = boost::lexical_cast<std::string>(info.peer);

				    res.peer = fmt::to_string(info.peer);

				    res.session_index = 0;

				    res.total_bytes = info.total_bytes;

				    return res;

				@@ -61,7 +62,7 @@ static hs::stream_state get_state(

				    state.plan_id = result_future.plan_id.to_sstring();

				    for (auto info : result_future.get_coordinator().get()->get_all_session_info()) {

				        hs::stream_info si;

				        si.peer = boost::lexical_cast<std::string>(info.peer);

				        si.peer = fmt::to_string(info.peer);

				        si.session_index = 0;

				        si.state = info.state;

				        si.connecting = si.peer;

									
										4

api/stream_manager.hh
									
												View File
												
				@@ -12,7 +12,7 @@

				namespace api {

				void set_stream_manager(http_context& ctx, routes& r, sharded<streaming::stream_manager>& sm);

				void unset_stream_manager(http_context& ctx, routes& r);

				void set_stream_manager(http_context& ctx, httpd::routes& r, sharded<streaming::stream_manager>& sm);

				void unset_stream_manager(http_context& ctx, httpd::routes& r);

				}

									
										11

api/system.cc
									
												View File
												
				@@ -17,6 +17,7 @@

				extern logging::logger apilog;

				namespace api {

				using namespace seastar::httpd;

				namespace hs = httpd::system_json;

				@@ -61,6 +62,16 @@ void set_system(http_context& ctx, routes& r) {

				        return json::json_void();

				    });

				    hs::write_log_message.set(r, [](const_req req) {

				        try {

				            logging::log_level level = boost::lexical_cast<logging::log_level>(std::string(req.get_query_param("level")));

				            apilog.log(level, "/system/log: {}", std::string(req.get_query_param("message")));

				        } catch (boost::bad_lexical_cast& e) {

				            throw bad_param_exception("Unknown logging level " + req.get_query_param("level"));

				        }

				        return json::json_void();

				    });

				    hs::drop_sstable_caches.set(r, [&ctx](std::unique_ptr<request> req) {

				        apilog.info("Dropping sstable caches");

				        return ctx.db.invoke_on_all([] (replica::database& db) {

									
										2

api/system.hh
									
												View File
												
				@@ -12,6 +12,6 @@

				namespace api {

				void set_system(http_context& ctx, routes& r);

				void set_system(http_context& ctx, httpd::routes& r);

				}

									
										236

api/task_manager.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,236 @@

				/*

				 * Copyright (C) 2022-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include <seastar/core/coroutine.hh>

				#include "task_manager.hh"

				#include "api/api-doc/task_manager.json.hh"

				#include "db/system_keyspace.hh"

				#include "column_family.hh"

				#include "unimplemented.hh"

				#include "storage_service.hh"

				#include <utility>

				#include <boost/range/adaptors.hpp>

				namespace api {

				namespace tm = httpd::task_manager_json;

				using namespace json;

				using namespace seastar::httpd;

				inline bool filter_tasks(tasks::task_manager::task_ptr task, std::unordered_map<sstring, sstring>& query_params) {

				    return (!query_params.contains("keyspace") || query_params["keyspace"] == task->get_status().keyspace) &&

				        (!query_params.contains("table") || query_params["table"] == task->get_status().table);

				}

				struct full_task_status {

				    tasks::task_manager::task::status task_status;

				    std::string type;

				    tasks::task_manager::task::progress progress;

				    std::string module;

				    tasks::task_id parent_id;

				    tasks::is_abortable abortable;

				    std::vector<std::string> children_ids;

				};

				struct task_stats {

				    task_stats(tasks::task_manager::task_ptr task)

				        : task_id(task->id().to_sstring())

				        , state(task->get_status().state)

				        , type(task->type())

				        , keyspace(task->get_status().keyspace)

				        , table(task->get_status().table)

				        , entity(task->get_status().entity)

				        , sequence_number(task->get_status().sequence_number)

				    { }

				    sstring task_id;

				    tasks::task_manager::task_state state;

				    std::string type;

				    std::string keyspace;

				    std::string table;

				    std::string entity;

				    uint64_t sequence_number;

				};

				tm::task_status make_status(full_task_status status) {

				    auto start_time = db_clock::to_time_t(status.task_status.start_time);

				    auto end_time = db_clock::to_time_t(status.task_status.end_time);

				    ::tm st, et;

				    ::gmtime_r(&end_time, &et);

				    ::gmtime_r(&start_time, &st);

				    tm::task_status res{};

				    res.id = status.task_status.id.to_sstring();

				    res.type = status.type;

				    res.state = status.task_status.state;

				    res.is_abortable = bool(status.abortable);

				    res.start_time = st;

				    res.end_time = et;

				    res.error = status.task_status.error;

				    res.parent_id = status.parent_id.to_sstring();

				    res.sequence_number = status.task_status.sequence_number;

				    res.shard = status.task_status.shard;

				    res.keyspace = status.task_status.keyspace;

				    res.table = status.task_status.table;

				    res.entity = status.task_status.entity;

				    res.progress_units = status.task_status.progress_units;

				    res.progress_total = status.progress.total;

				    res.progress_completed = status.progress.completed;

				    res.children_ids = std::move(status.children_ids);

				    return res;

				}

				future<full_task_status> retrieve_status(const tasks::task_manager::foreign_task_ptr& task) {

				    if (task.get() == nullptr) {

				        co_return coroutine::return_exception(httpd::bad_param_exception("Task not found"));

				    }

				    auto progress = co_await task->get_progress();

				    full_task_status s;

				    s.task_status = task->get_status();

				    s.type = task->type();

				    s.parent_id = task->get_parent_id();

				    s.abortable = task->is_abortable();

				    s.module = task->get_module_name();

				    s.progress.completed = progress.completed;

				    s.progress.total = progress.total;

				    std::vector<std::string> ct{task->get_children().size()};

				    boost::transform(task->get_children(), ct.begin(), [] (const auto& child) {

				        return child->id().to_sstring();

				    });

				    s.children_ids = std::move(ct);

				    co_return s;

				}

				void set_task_manager(http_context& ctx, routes& r, db::config& cfg) {

				    tm::get_modules.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        std::vector<std::string> v = boost::copy_range<std::vector<std::string>>(ctx.tm.local().get_modules() | boost::adaptors::map_keys);

				        co_return v;

				    });

				    tm::get_tasks.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        using chunked_stats = utils::chunked_vector<task_stats>;

				        auto internal = tasks::is_internal{req_param<bool>(*req, "internal", false)};

				        std::vector<chunked_stats> res = co_await ctx.tm.map([&req, internal] (tasks::task_manager& tm) {

				            chunked_stats local_res;

				            auto module = tm.find_module(req->param["module"]);

				            const auto& filtered_tasks = module->get_tasks() | boost::adaptors::filtered([&params = req->query_parameters, internal] (const auto& task) {

				                return (internal || !task.second->is_internal()) && filter_tasks(task.second, params);

				            });

				            for (auto& [task_id, task] : filtered_tasks) {

				                local_res.push_back(task_stats{task});

				            }

				            return local_res;

				        });

				        std::function<future<>(output_stream<char>&&)> f = [r = std::move(res)] (output_stream<char>&& os) -> future<> {

				            auto s = std::move(os);

				            auto res = std::move(r);

				            co_await s.write("[");

				            std::string delim = "";

				            for (auto& v: res) {

				                for (auto& stats: v) {

				                    co_await s.write(std::exchange(delim, ", "));

				                    tm::task_stats ts;

				                    ts = stats;

				                    co_await formatter::write(s, ts);

				                }

				            }

				            co_await s.write("]");

				            co_await s.close();

				        };

				        co_return std::move(f);

				    });

				    tm::get_task_status.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};

				        auto task = co_await tasks::task_manager::invoke_on_task(ctx.tm, id, std::function([] (tasks::task_manager::task_ptr task) -> future<tasks::task_manager::foreign_task_ptr> {

				            auto state = task->get_status().state;

				            if (state == tasks::task_manager::task_state::done || state == tasks::task_manager::task_state::failed) {

				                task->unregister_task();

				            }

				            co_return std::move(task);

				        }));

				        auto s = co_await retrieve_status(task);

				        co_return make_status(s);

				    });

				    tm::abort_task.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};

				        co_await tasks::task_manager::invoke_on_task(ctx.tm, id, [] (tasks::task_manager::task_ptr task) -> future<> {

				            if (!task->is_abortable()) {

				                co_await coroutine::return_exception(std::runtime_error("Requested task cannot be aborted"));

				            }

				            co_await task->abort();

				        });

				        co_return json_void();

				    });

				    tm::wait_task.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};

				        auto task = co_await tasks::task_manager::invoke_on_task(ctx.tm, id, std::function([] (tasks::task_manager::task_ptr task) {

				            return task->done().then_wrapped([task] (auto f) {

				                task->unregister_task();

				                f.get();

				                return make_foreign(task);

				            });

				        }));

				        auto s = co_await retrieve_status(task);

				        co_return make_status(s);

				    });

				    tm::get_task_status_recursively.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto& _ctx = ctx;

				        auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};

				        std::queue<tasks::task_manager::foreign_task_ptr> q;

				        utils::chunked_vector<full_task_status> res;

				        // Get requested task.

				        auto task = co_await tasks::task_manager::invoke_on_task(_ctx.tm, id, std::function([] (tasks::task_manager::task_ptr task) -> future<tasks::task_manager::foreign_task_ptr> {

				            auto state = task->get_status().state;

				            if (state == tasks::task_manager::task_state::done || state == tasks::task_manager::task_state::failed) {

				                task->unregister_task();

				            }

				            co_return task;

				        }));

				        // Push children's statuses in BFS order.

				        q.push(co_await task.copy());   // Task cannot be moved since we need it to be alive during whole loop execution.

				        while (!q.empty()) {

				            auto& current = q.front();

				            res.push_back(co_await retrieve_status(current));

				            for (size_t i = 0; i < current->get_children().size(); ++i) {

				                q.push(co_await current->get_children()[i].copy());

				            }

				            q.pop();

				        }

				        std::function<future<>(output_stream<char>&&)> f = [r = std::move(res)] (output_stream<char>&& os) -> future<> {

				            auto s = std::move(os);

				            auto res = std::move(r);

				            co_await s.write("[");

				            std::string delim = "";

				            for (auto& status: res) {

				                co_await s.write(std::exchange(delim, ", "));

				                co_await formatter::write(s, make_status(status));

				            }

				            co_await s.write("]");

				            co_await s.close();

				        };

				        co_return f;

				    });

				    tm::get_and_update_ttl.set(r, [&cfg] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        uint32_t ttl = cfg.task_ttl_seconds();

				        co_await cfg.task_ttl_seconds.set_value_on_all_shards(req->query_parameters["ttl"], utils::config_file::config_source::API);

				        co_return json::json_return_type(ttl);

				    });

				}

				}

									
										18

api/task_manager.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,18 @@

				/*

				 * Copyright (C) 2022-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#pragma once

				#include "api.hh"

				#include "db/config.hh"

				namespace api {

				void set_task_manager(http_context& ctx, httpd::routes& r, db::config& cfg);

				}

									
										102

api/task_manager_test.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,102 @@

				/*

				 * Copyright (C) 2022-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#ifndef SCYLLA_BUILD_MODE_RELEASE

				#include <seastar/core/coroutine.hh>

				#include "task_manager_test.hh"

				#include "api/api-doc/task_manager_test.json.hh"

				#include "tasks/test_module.hh"

				namespace api {

				namespace tmt = httpd::task_manager_test_json;

				using namespace json;

				using namespace seastar::httpd;

				void set_task_manager_test(http_context& ctx, routes& r) {

				    tmt::register_test_module.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        co_await ctx.tm.invoke_on_all([] (tasks::task_manager& tm) {

				            auto m = make_shared<tasks::test_module>(tm);

				            tm.register_module("test", m);

				        });

				        co_return json_void();

				    });

				    tmt::unregister_test_module.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        co_await ctx.tm.invoke_on_all([] (tasks::task_manager& tm) -> future<> {

				            auto module_name = "test";

				            auto module = tm.find_module(module_name);

				            co_await module->stop();

				        });

				        co_return json_void();

				    });

				    tmt::register_test_task.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        sharded<tasks::task_manager>& tms = ctx.tm;

				        auto it = req->query_parameters.find("task_id");

				        auto id = it != req->query_parameters.end() ? tasks::task_id{utils::UUID{it->second}} : tasks::task_id::create_null_id();

				        it = req->query_parameters.find("shard");

				        unsigned shard = it != req->query_parameters.end() ? boost::lexical_cast<unsigned>(it->second) : 0;

				        it = req->query_parameters.find("keyspace");

				        std::string keyspace = it != req->query_parameters.end() ? it->second : "";

				        it = req->query_parameters.find("table");

				        std::string table = it != req->query_parameters.end() ? it->second : "";

				        it = req->query_parameters.find("entity");

				        std::string entity = it != req->query_parameters.end() ? it->second : "";

				        it = req->query_parameters.find("parent_id");

				        tasks::task_info data;

				        if (it != req->query_parameters.end()) {

				            data.id = tasks::task_id{utils::UUID{it->second}};

				            auto parent_ptr = co_await tasks::task_manager::lookup_task_on_all_shards(ctx.tm, data.id);

				            data.shard = parent_ptr->get_status().shard;

				        }

				        auto module = tms.local().find_module("test");

				        id = co_await module->make_task<tasks::test_task_impl>(shard, id, keyspace, table, entity, data);

				        co_await tms.invoke_on(shard, [id] (tasks::task_manager& tm) {

				            auto it = tm.get_all_tasks().find(id);

				            if (it != tm.get_all_tasks().end()) {

				                it->second->start();

				            }

				        });

				        co_return id.to_sstring();

				    });

				    tmt::unregister_test_task.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto id = tasks::task_id{utils::UUID{req->query_parameters["task_id"]}};

				        co_await tasks::task_manager::invoke_on_task(ctx.tm, id, [] (tasks::task_manager::task_ptr task) -> future<> {

				            tasks::test_task test_task{task};

				            co_await test_task.unregister_task();

				        });

				        co_return json_void();

				    });

				    tmt::finish_test_task.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};

				        auto it = req->query_parameters.find("error");

				        bool fail = it != req->query_parameters.end();

				        std::string error = fail ? it->second : "";

				        co_await tasks::task_manager::invoke_on_task(ctx.tm, id, [fail, error = std::move(error)] (tasks::task_manager::task_ptr task) {

				            tasks::test_task test_task{task};

				            if (fail) {

				                test_task.finish_failed(std::make_exception_ptr(std::runtime_error(error)));

				            } else {

				                test_task.finish();

				            }

				            return make_ready_future<>();

				        });

				        co_return json_void();

				    });

				}

				}

				#endif

									
										21

api/task_manager_test.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,21 @@

				/*

				 * Copyright (C) 2022-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#ifndef SCYLLA_BUILD_MODE_RELEASE

				#pragma once

				#include "api.hh"

				namespace api {

				void set_task_manager_test(http_context& ctx, httpd::routes& r);

				}

				#endif

									
										35

auth/CMakeLists.txt
									
										Normal file
									
												View File
												
				@@ -0,0 +1,35 @@

				include(add_whole_archive)

				add_library(scylla_auth STATIC)

				target_sources(scylla_auth

				  PRIVATE

				    allow_all_authenticator.cc

				    allow_all_authorizer.cc

				    authenticated_user.cc

				    authenticator.cc

				    common.cc

				    default_authorizer.cc

				    password_authenticator.cc

				    passwords.cc

				    permission.cc

				    permissions_cache.cc

				    resource.cc

				    role_or_anonymous.cc

				    roles-metadata.cc

				    sasl_challenge.cc

				    service.cc

				    standard_role_manager.cc

				    transitional.cc)

				target_include_directories(scylla_auth

				  PUBLIC

				    ${CMAKE_SOURCE_DIR})

				target_link_libraries(scylla_auth

				  PUBLIC

				    Seastar::seastar

				    xxHash::xxhash

				  PRIVATE

				    cql3

				    idl

				    wasmtime_bindings)

				add_whole_archive(auth scylla_auth)

									
										12

auth/authenticated_user.cc
									
												View File
												
				@@ -10,24 +10,12 @@

				#include "auth/authenticated_user.hh"

				#include <iostream>

				namespace auth {

				authenticated_user::authenticated_user(std::string_view name)

				        : name(sstring(name)) {

				}

				std::ostream& operator<<(std::ostream& os, const authenticated_user& u) {

				    if (!u.name) {

				        os << "anonymous";

				    } else {

				        os << *u.name;

				    }

				    return os;

				}

				static const authenticated_user the_anonymous_user{};

				const authenticated_user& anonymous_user() noexcept {

									
										21

auth/authenticated_user.hh
									
												View File
												
				@@ -12,7 +12,6 @@

				#include <string_view>

				#include <functional>

				#include <iosfwd>

				#include <optional>

				#include <seastar/core/sstring.hh>

				@@ -38,11 +37,6 @@ public:

				    explicit authenticated_user(std::string_view name);

				};

				///

				/// The user name, or "anonymous".

				///

				std::ostream& operator<<(std::ostream&, const authenticated_user&);

				inline bool operator==(const authenticated_user& u1, const authenticated_user& u2) noexcept {

				    return u1.name == u2.name;

				}

				@@ -59,6 +53,21 @@ inline bool is_anonymous(const authenticated_user& u) noexcept {

				}

				///

				/// The user name, or "anonymous".

				///

				template <>

				struct fmt::formatter<auth::authenticated_user> : fmt::formatter<std::string_view> {

				    template <typename FormatContext>

				    auto format(const auth::authenticated_user& u, FormatContext& ctx) const {

				        if (u.name) {

				            return fmt::format_to(ctx.out(), "{}", *u.name);

				        } else {

				            return fmt::format_to(ctx.out(), "{}", "anonymous");

				        }

				    }

				};

				namespace std {

				template <>

									
										24

auth/authentication_options.cc
									
												View File
											
				@@ -1,24 +0,0 @@

				/*

				 * Copyright (C) 2018-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include "auth/authentication_options.hh"

				#include <iostream>

				namespace auth {

				std::ostream& operator<<(std::ostream& os, authentication_option a) {

				    switch (a) {

				        case authentication_option::password: os << "PASSWORD"; break;

				        case authentication_option::options: os << "OPTIONS"; break;

				    }

				    return os;

				}

				}

									
										17

auth/authentication_options.hh
									
												View File
												
				@@ -26,8 +26,6 @@ enum class authentication_option {

				    options

				};

				std::ostream& operator<<(std::ostream&, authentication_option);

				using authentication_option_set = std::unordered_set<authentication_option>;

				using custom_options = std::unordered_map<sstring, sstring>;

				@@ -49,3 +47,18 @@ public:

				};

				}

				template <>

				struct fmt::formatter<auth::authentication_option> : fmt::formatter<std::string_view> {

				    template <typename FormatContext>

				    auto format(const auth::authentication_option a, FormatContext& ctx) const {

				        using enum auth::authentication_option;

				        switch (a) {

				        case password:

				            return formatter<std::string_view>::format("PASSWORD", ctx);

				        case options:

				            return formatter<std::string_view>::format("OPTIONS", ctx);

				        }

				        std::abort();

				    }

				};

									
										2

auth/common.cc
									
												View File
												
				@@ -14,7 +14,7 @@

				#include "cql3/query_processor.hh"

				#include "cql3/statements/create_table_statement.hh"

				#include "replica/database.hh"

				#include "schema_builder.hh"

				#include "schema/schema_builder.hh"

				#include "service/migration_manager.hh"

				#include "timeout_config.hh"

									
										2

auth/common.hh
									
												View File
												
				@@ -30,8 +30,6 @@ namespace replica {

				class database;

				}

				class timeout_config;

				namespace service {

				class migration_manager;

				}

									
										2

auth/default_authorizer.cc
									
												View File
												
				@@ -74,7 +74,7 @@ future<bool> default_authorizer::any_granted() const {

				            query,

				            db::consistency_level::LOCAL_ONE,

				            {},

				            cql3::query_processor::cache_internal::yes).then([this](::shared_ptr<cql3::untyped_result_set> results) {

				            cql3::query_processor::cache_internal::yes).then([](::shared_ptr<cql3::untyped_result_set> results) {

				        return !results->empty();

				    });

				}

									
										2

auth/passwords.cc
									
												View File
												
				@@ -18,7 +18,7 @@ extern "C" {

				namespace auth::passwords {

				static thread_local crypt_data tlcrypt = { 0, };

				static thread_local crypt_data tlcrypt = {};

				namespace detail {

									
										6

auth/permission.cc
									
												View File
												
				@@ -21,7 +21,8 @@ const auth::permission_set auth::permissions::ALL = auth::permission_set::of<

				        auth::permission::SELECT,

				        auth::permission::MODIFY,

				        auth::permission::AUTHORIZE,

				        auth::permission::DESCRIBE>();

				        auth::permission::DESCRIBE,

				        auth::permission::EXECUTE>();

				const auth::permission_set auth::permissions::NONE;

				@@ -34,7 +35,8 @@ static const std::unordered_map<sstring, auth::permission> permission_names({

				        {"SELECT", auth::permission::SELECT},

				        {"MODIFY", auth::permission::MODIFY},

				        {"AUTHORIZE", auth::permission::AUTHORIZE},

				        {"DESCRIBE", auth::permission::DESCRIBE}});

				        {"DESCRIBE", auth::permission::DESCRIBE},

				        {"EXECUTE", auth::permission::EXECUTE}});

				const sstring& auth::permissions::to_string(permission p) {

				    for (auto& v : permission_names) {

									
										5

auth/permission.hh
									
												View File
												
				@@ -38,6 +38,8 @@ enum class permission {

				    AUTHORIZE, // required for GRANT and REVOKE.

				    DESCRIBE, // required on the root-level role resource to list all roles.

				    // function/aggregate/procedure calls

				    EXECUTE,

				};

				typedef enum_set<

				@@ -51,7 +53,8 @@ typedef enum_set<

				                permission::SELECT,

				                permission::MODIFY,

				                permission::AUTHORIZE,

				                permission::DESCRIBE>> permission_set;

				                permission::DESCRIBE,

				                permission::EXECUTE>> permission_set;

				bool operator<(const permission_set&, const permission_set&);

									
										159

auth/resource.cc
									
												View File
												
				@@ -16,30 +16,26 @@

				#include <boost/algorithm/string/join.hpp>

				#include <boost/algorithm/string/split.hpp>

				#include <boost/algorithm/string/classification.hpp>

				#include "service/storage_proxy.hh"

				#include "data_dictionary/user_types_metadata.hh"

				#include "cql3/util.hh"

				#include "db/marshal/type_parser.hh"

				namespace auth {

				std::ostream& operator<<(std::ostream& os, resource_kind kind) {

				    switch (kind) {

				        case resource_kind::data: os << "data"; break;

				        case resource_kind::role: os << "role"; break;

				        case resource_kind::service_level: os << "service_level"; break;

				    }

				    return os;

				}

				static const std::unordered_map<resource_kind, std::string_view> roots{

				        {resource_kind::data, "data"},

				        {resource_kind::role, "roles"},

				        {resource_kind::service_level, "service_levels"}};

				        {resource_kind::service_level, "service_levels"},

				        {resource_kind::functions, "functions"}};

				static const std::unordered_map<resource_kind, std::size_t> max_parts{

				        {resource_kind::data, 2},

				        {resource_kind::role, 1},

				        {resource_kind::service_level, 0}};

				        {resource_kind::service_level, 0},

				        {resource_kind::functions, 2}};

				static permission_set applicable_permissions(const data_resource_view& dv) {

				    if (dv.table()) {

				@@ -82,6 +78,15 @@ static permission_set applicable_permissions(const service_level_resource_view &

				            permission::AUTHORIZE>();

				}

				static permission_set applicable_permissions(const functions_resource_view& fv) {

				    return permission_set::of<

				            permission::CREATE,

				            permission::ALTER,

				            permission::DROP,

				            permission::AUTHORIZE,

				            permission::EXECUTE>();

				}

				resource::resource(resource_kind kind) : _kind(kind) {

				    _parts.emplace_back(roots.at(kind));

				}

				@@ -106,6 +111,31 @@ resource::resource(role_resource_t, std::string_view role) : resource(resource_k

				resource::resource(service_level_resource_t): resource(resource_kind::service_level) {

				}

				resource::resource(functions_resource_t) : resource(resource_kind::functions) {

				}

				resource::resource(functions_resource_t, std::string_view keyspace) : resource(resource_kind::functions) {

				    _parts.emplace_back(keyspace);

				}

				resource::resource(functions_resource_t, std::string_view keyspace, std::string_view function_signature) : resource(resource_kind::functions) {

				    _parts.emplace_back(keyspace);

				    _parts.emplace_back(function_signature);

				}

				resource::resource(functions_resource_t, std::string_view keyspace, std::string_view function_name, std::vector<::shared_ptr<cql3::cql3_type::raw>> function_args) : resource(resource_kind::functions) {

				    _parts.emplace_back(keyspace);

				    _parts.emplace_back(function_name);

				    if (function_args.empty()) {

				        _parts.emplace_back("");

				        return;

				    }

				    for (auto& arg_type : function_args) {

				        // We can't validate the UDTs here, so we just use the raw cql type names.

				        _parts.emplace_back(arg_type->to_string());

				    }

				}

				sstring resource::name() const {

				    return boost::algorithm::join(_parts, "/");

				}

				@@ -127,6 +157,7 @@ permission_set resource::applicable_permissions() const {

				        case resource_kind::data: ps = ::auth::applicable_permissions(data_resource_view(*this)); break;

				        case resource_kind::role: ps = ::auth::applicable_permissions(role_resource_view(*this)); break;

				        case resource_kind::service_level: ps = ::auth::applicable_permissions(service_level_resource_view(*this)); break;

				        case resource_kind::functions: ps = ::auth::applicable_permissions(functions_resource_view(*this)); break;

				    }

				    return ps;

				@@ -149,6 +180,7 @@ std::ostream& operator<<(std::ostream& os, const resource& r) {

				        case resource_kind::data: return os << data_resource_view(r);

				        case resource_kind::role: return os << role_resource_view(r);

				        case resource_kind::service_level: return os << service_level_resource_view(r);

				        case resource_kind::functions: return os << functions_resource_view(r);

				    }

				    return os;

				@@ -165,6 +197,109 @@ std::ostream &operator<<(std::ostream &os, const service_level_resource_view &v)

				    return os;

				}

				sstring encode_signature(std::string_view name, std::vector<data_type> args) {

				    return format("{}[{}]", name,

				            fmt::join(args | boost::adaptors::transformed([] (const data_type t) {

				                return t->name();

				            }), "^"));

				}

				std::pair<sstring, std::vector<data_type>> decode_signature(std::string_view encoded_signature) {

				    auto name_delim = encoded_signature.find_last_of('[');

				    std::string_view function_name = encoded_signature.substr(0, name_delim);

				    encoded_signature.remove_prefix(name_delim + 1);

				    encoded_signature.remove_suffix(1);

				    if (encoded_signature.empty()) {

				        return {sstring(function_name), {}};

				    }

				    std::vector<std::string_view> raw_types;

				    boost::split(raw_types, encoded_signature, boost::is_any_of("^"));

				    std::vector<data_type> decoded_types = boost::copy_range<std::vector<data_type>>(

				        raw_types | boost::adaptors::transformed([] (std::string_view raw_type) {

				            return db::marshal::type_parser::parse(raw_type);

				        })

				    );

				    return {sstring(function_name), decoded_types};

				}

				// Purely for Cassandra compatibility, types in the function signature are

				// decoded from their verbose form (org.apache.cassandra.db.marshal.Int32Type)

				// to the short form (int)

				static sstring decoded_signature_string(std::string_view encoded_signature) {

				    auto [function_name, arg_types] = decode_signature(encoded_signature);

				    return format("{}({})", cql3::util::maybe_quote(sstring(function_name)),

				            boost::algorithm::join(arg_types | boost::adaptors::transformed([] (data_type t) {

				                return t->cql3_type_name();

				            }), ", "));

				}

				std::ostream &operator<<(std::ostream &os, const functions_resource_view &v) {

				    const auto keyspace = v.keyspace();

				    const auto function_signature = v.function_signature();

				    const auto name = v.function_name();

				    const auto args = v.function_args();

				    if (!keyspace) {

				        os << "<all functions>";

				    } else if (name) {

				        os << "<function " << *keyspace << '.' << cql3::util::maybe_quote(sstring(*name)) << '(';

				        for (auto arg : *args) {

				            os << arg << ',';

				        }

				        os << ")>";

				    } else if (!function_signature) {

				        os << "<all functions in " << *keyspace << '>';

				    } else {

				        os << "<function " << *keyspace << '.' << decoded_signature_string(*function_signature) << '>';

				    }

				    return os;

				}

				functions_resource_view::functions_resource_view(const resource& r) : _resource(r) {

				    if (r._kind != resource_kind::functions) {

				        throw resource_kind_mismatch(resource_kind::functions, r._kind);

				    }

				}

				std::optional<std::string_view> functions_resource_view::keyspace() const {

				    if (_resource._parts.size() == 1) {

				        return {};

				    }

				    return _resource._parts[1];

				}

				std::optional<std::string_view> functions_resource_view::function_signature() const {

				    if (_resource._parts.size() <= 2 || _resource._parts.size() > 3) {

				        return {};

				    }

				    return _resource._parts[2];

				}

				std::optional<std::string_view> functions_resource_view::function_name() const {

				    if (_resource._parts.size() <= 3) {

				        return {};

				    }

				    return _resource._parts[2];

				}

				std::optional<std::vector<std::string_view>> functions_resource_view::function_args() const {

				    if (_resource._parts.size() <= 3) {

				        return {};

				    }

				    std::vector<std::string_view> parts;

				    if (_resource._parts[3] == "") {

				        return {};

				    }

				    for (size_t i = 3; i < _resource._parts.size(); i++) {

				        parts.push_back(_resource._parts[i]);

				    }

				    return parts;

				}

				data_resource_view::data_resource_view(const resource& r) : _resource(r) {

				    if (r._kind != resource_kind::data) {

				        throw resource_kind_mismatch(resource_kind::data, r._kind);

									
										84

auth/resource.hh
									
												View File
												
				@@ -18,6 +18,7 @@

				#include <vector>

				#include <unordered_set>

				#include <boost/range/adaptor/transformed.hpp>

				#include <seastar/core/print.hh>

				#include <seastar/core/sstring.hh>

				@@ -25,6 +26,7 @@

				#include "seastarx.hh"

				#include "utils/hash.hh"

				#include "utils/small_vector.hh"

				#include "cql3/cql3_type.hh"

				namespace auth {

				@@ -36,11 +38,9 @@ public:

				};

				enum class resource_kind {

				    data, role, service_level

				    data, role, service_level, functions

				};

				std::ostream& operator<<(std::ostream&, resource_kind);

				///

				/// Type tag for constructing data resources.

				///

				@@ -56,10 +56,15 @@ struct role_resource_t final {};

				///

				struct service_level_resource_t final {};

				///

				/// Type tag for constructing function resources.

				///

				struct functions_resource_t final {};

				///

				/// Resources are entities that users can be granted permissions on.

				///

				/// There are data (keyspaces and tables) and role resources. There may be other kinds of resources in the future.

				/// There are data (keyspaces and tables), role and function resources. There may be other kinds of resources in the future.

				///

				/// When they are stored as system metadata, resources have the form `root/part_0/part_1/.../part_n`. Each kind of

				/// resource has a specific root prefix, followed by a maximum of `n` parts (where `n` is distinct for each kind of

				@@ -83,6 +88,11 @@ public:

				    resource(data_resource_t, std::string_view keyspace, std::string_view table);

				    resource(role_resource_t, std::string_view role);

				    resource(service_level_resource_t);

				    explicit resource(functions_resource_t);

				    resource(functions_resource_t, std::string_view keyspace);

				    resource(functions_resource_t, std::string_view keyspace, std::string_view function_signature);

				    resource(functions_resource_t, std::string_view keyspace, std::string_view function_name,

				            std::vector<::shared_ptr<cql3::cql3_type::raw>> function_args);

				    resource_kind kind() const noexcept {

				        return _kind;

				@@ -104,6 +114,7 @@ private:

				    friend class data_resource_view;

				    friend class role_resource_view;

				    friend class service_level_resource_view;

				    friend class functions_resource_view;

				    friend bool operator<(const resource&, const resource&);

				    friend bool operator==(const resource&, const resource&);

				@@ -182,6 +193,25 @@ public:

				std::ostream& operator<<(std::ostream&, const service_level_resource_view&);

				///

				/// A "function" view of \ref resource.

				///

				class functions_resource_view final {

				    const resource& _resource;

				public:

				    ///

				    /// \throws \ref resource_kind_mismatch if the argument is not a "function" resource.

				    ///

				    explicit functions_resource_view(const resource&);

				    std::optional<std::string_view> keyspace() const;

				    std::optional<std::string_view> function_signature() const;

				    std::optional<std::string_view> function_name() const;

				    std::optional<std::vector<std::string_view>> function_args() const;

				};

				std::ostream& operator<<(std::ostream&, const functions_resource_view&);

				///

				/// Parse a resource from its name.

				///

				@@ -210,8 +240,49 @@ inline resource make_service_level_resource() {

				    return resource(service_level_resource_t{});

				}

				const resource& root_function_resource();

				inline resource make_functions_resource() {

				    return resource(functions_resource_t{});

				}

				inline resource make_functions_resource(std::string_view keyspace) {

				    return resource(functions_resource_t{}, keyspace);

				}

				inline resource make_functions_resource(std::string_view keyspace, std::string_view function_signature) {

				    return resource(functions_resource_t{}, keyspace, function_signature);

				}

				inline resource make_functions_resource(std::string_view keyspace, std::string_view function_name, std::vector<::shared_ptr<cql3::cql3_type::raw>> function_signature) {

				    return resource(functions_resource_t{}, keyspace, function_name, function_signature);

				}

				sstring encode_signature(std::string_view name, std::vector<data_type> args);

				std::pair<sstring, std::vector<data_type>> decode_signature(std::string_view encoded_signature);

				}

				template <>

				struct fmt::formatter<auth::resource_kind> : fmt::formatter<std::string_view> {

				    template <typename FormatContext>

				    auto format(const auth::resource_kind kind, FormatContext& ctx) const {

				        using enum auth::resource_kind;

				        switch (kind) {

				        case data:

				            return formatter<std::string_view>::format("data", ctx);

				        case role:

				            return formatter<std::string_view>::format("role", ctx);

				        case service_level:

				            return formatter<std::string_view>::format("service_level", ctx);

				        case functions:

				            return formatter<std::string_view>::format("functions", ctx);

				        }

				        std::abort();

				    }

				};

				namespace std {

				template <>

				@@ -228,6 +299,10 @@ struct hash<auth::resource> {

				            return utils::tuple_hash()(std::make_tuple(auth::resource_kind::service_level));

				    }

				    static size_t hash_function(const auth::functions_resource_view& fv) {

				        return utils::tuple_hash()(std::make_tuple(auth::resource_kind::functions, fv.keyspace(), fv.function_signature()));

				    }

				    size_t operator()(const auth::resource& r) const {

				        std::size_t value;

				@@ -235,6 +310,7 @@ struct hash<auth::resource> {

				        case auth::resource_kind::data: value = hash_data(auth::data_resource_view(r)); break;

				        case auth::resource_kind::role: value = hash_role(auth::role_resource_view(r)); break;

				        case auth::resource_kind::service_level: value = hash_service_level(auth::service_level_resource_view(r)); break;

				        case auth::resource_kind::functions: value = hash_function(auth::functions_resource_view(r)); break;

				        }

				        return value;

									
										20

auth/service.cc
									
												View File
												
				@@ -20,17 +20,19 @@

				#include "auth/allow_all_authorizer.hh"

				#include "auth/common.hh"

				#include "auth/role_or_anonymous.hh"

				#include "cql3/functions/functions.hh"

				#include "cql3/query_processor.hh"

				#include "cql3/untyped_result_set.hh"

				#include "db/config.hh"

				#include "db/consistency_level_type.hh"

				#include "db/functions/function_name.hh"

				#include "exceptions/exceptions.hh"

				#include "log.hh"

				#include "service/migration_manager.hh"

				#include "utils/class_registrator.hh"

				#include "locator/abstract_replication_strategy.hh"

				#include "data_dictionary/keyspace_metadata.hh"

				#include "mutation.hh"

				#include "mutation/mutation.hh"

				namespace auth {

				@@ -346,6 +348,22 @@ future<bool> service::exists(const resource& r) const {

				        }

				        case resource_kind::service_level:

				            return make_ready_future<bool>(true);

				        case resource_kind::functions: {

				            const auto& db = _qp.db();

				            functions_resource_view v(r);

				            const auto keyspace = v.keyspace();

				            if (!keyspace) {

				                return make_ready_future<bool>(true);

				            }

				            const auto function_signature = v.function_signature();

				            if (!function_signature) {

				                return make_ready_future<bool>(db.has_keyspace(sstring(*keyspace)));

				            }

				            auto [name, function_args] = auth::decode_signature(*function_signature);

				            return make_ready_future<bool>(cql3::functions::functions::find(db::functions::function_name{sstring(*keyspace), name}, function_args));

				        }

				    }

				    return make_ready_future<bool>(false);

									
										2

auth/standard_role_manager.cc
									
												View File
												
				@@ -470,7 +470,7 @@ standard_role_manager::grant(std::string_view grantee_name, std::string_view rol

				future<>

				standard_role_manager::revoke(std::string_view revokee_name, std::string_view role_name) {

				    return this->exists(role_name).then([this, revokee_name, role_name](bool role_exists) {

				    return this->exists(role_name).then([role_name](bool role_exists) {

				        if (!role_exists) {

				            throw nonexistant_role(sstring(role_name));

				        }

									
										59

build_mode.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,59 @@

				/*

				 * Copyright (C) 2022-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#pragma once

				#ifndef SCYLLA_BUILD_MODE

				#error SCYLLA_BUILD_MODE must be defined

				#endif

				#ifndef STRINGIFY

				// We need two levels of indirection

				// to make a string out of the macro name.

				// The outer level expands the macro

				// and the inner level makes a string out of the expanded macro.

				#define STRINGIFY_VALUE(x) #x

				#define STRINGIFY_MACRO(x) STRINGIFY_VALUE(x)

				#endif

				#define SCYLLA_BUILD_MODE_STR STRINGIFY_MACRO(SCYLLA_BUILD_MODE)

				// We use plain macro definitions

				// so the preprocessor can expand them

				// inline in the #if directives below

				#define SCYLLA_BUILD_MODE_CODE_debug 0

				#define SCYLLA_BUILD_MODE_CODE_release 1

				#define SCYLLA_BUILD_MODE_CODE_dev 2

				#define SCYLLA_BUILD_MODE_CODE_sanitize 3

				#define SCYLLA_BUILD_MODE_CODE_coverage 4

				#define _SCYLLA_BUILD_MODE_CODE(sbm) SCYLLA_BUILD_MODE_CODE_ ## sbm

				#define SCYLLA_BUILD_MODE_CODE(sbm) _SCYLLA_BUILD_MODE_CODE(sbm)

				#if SCYLLA_BUILD_MODE_CODE(SCYLLA_BUILD_MODE) == SCYLLA_BUILD_MODE_CODE_debug

				#define SCYLLA_BUILD_MODE_DEBUG

				#elif SCYLLA_BUILD_MODE_CODE(SCYLLA_BUILD_MODE) == SCYLLA_BUILD_MODE_CODE_release

				#define SCYLLA_BUILD_MODE_RELEASE

				#elif SCYLLA_BUILD_MODE_CODE(SCYLLA_BUILD_MODE) == SCYLLA_BUILD_MODE_CODE_dev

				#define SCYLLA_BUILD_MODE_DEV

				#elif SCYLLA_BUILD_MODE_CODE(SCYLLA_BUILD_MODE) == SCYLLA_BUILD_MODE_CODE_sanitize

				#define SCYLLA_BUILD_MODE_SANITIZE

				#elif SCYLLA_BUILD_MODE_CODE(SCYLLA_BUILD_MODE) == SCYLLA_BUILD_MODE_CODE_coverage

				#define SCYLLA_BUILD_MODE_COVERAGE

				#else

				#error unrecognized SCYLLA_BUILD_MODE

				#endif

				#if (defined(SCYLLA_BUILD_MODE_RELEASE) || defined(SCYLLA_BUILD_MODE_DEV)) && defined(SEASTAR_DEBUG)

				#error SEASTAR_DEBUG is not expected to be defined when SCYLLA_BUILD_MODE is "release" or "dev"

				#endif

				#if (defined(SCYLLA_BUILD_MODE_DEBUG) || defined(SCYLLA_BUILD_MODE_SANITIZE)) && !defined(SEASTAR_DEBUG)

				#error SEASTAR_DEBUG is expected to be defined when SCYLLA_BUILD_MODE is "debug" or "sanitize"

				#endif

									
										22

bytes.cc
									
												View File
												
				@@ -50,15 +50,7 @@ bytes from_hex(sstring_view s) {

				}

				sstring to_hex(bytes_view b) {

				    static char digits[] = "0123456789abcdef";

				    sstring out = uninitialized_string(b.size() * 2);

				    unsigned end = b.size();

				    for (unsigned i = 0; i != end; ++i) {

				        uint8_t x = b[i];

				        out[2*i] = digits[x >> 4];

				        out[2*i+1] = digits[x & 0xf];

				    }

				    return out;

				    return fmt::to_string(fmt_hex(b));

				}

				sstring to_hex(const bytes& b) {

				@@ -70,12 +62,14 @@ sstring to_hex(const bytes_opt& b) {

				}

				std::ostream& operator<<(std::ostream& os, const bytes& b) {

				    return os << to_hex(b);

				    fmt::print(os, "{}", b);

				    return os;

				}

				std::ostream& operator<<(std::ostream& os, const bytes_opt& b) {

				    if (b) {

				        return os << *b;

				        fmt::print(os, "{}", *b);

				        return os;

				    }

				    return os << "null";

				}

				@@ -83,11 +77,13 @@ std::ostream& operator<<(std::ostream& os, const bytes_opt& b) {

				namespace std {

				std::ostream& operator<<(std::ostream& os, const bytes_view& b) {

				    return os << to_hex(b);

				    fmt::print(os, "{}", fmt_hex(b));

				    return os;

				}

				}

				std::ostream& operator<<(std::ostream& os, const fmt_hex& b) {

				    return os << to_hex(b.v);

				    fmt::print(os, "{}", b);

				    return os;

				}

									
										90

bytes.hh
									
												View File
												
				@@ -9,8 +9,9 @@

				#pragma once

				#include "seastarx.hh"

				#include <fmt/format.h>

				#include <seastar/core/sstring.hh>

				#include "hashing.hh"

				#include "utils/hashing.hh"

				#include <optional>

				#include <iosfwd>

				#include <functional>

				@@ -37,8 +38,8 @@ inline bytes_view to_bytes_view(sstring_view view) {

				}

				struct fmt_hex {

				    bytes_view& v;

				    fmt_hex(bytes_view& v) noexcept : v(v) {}

				    const bytes_view& v;

				    fmt_hex(const bytes_view& v) noexcept : v(v) {}

				};

				std::ostream& operator<<(std::ostream& os, const fmt_hex& hex);

				@@ -51,6 +52,89 @@ sstring to_hex(const bytes_opt& b);

				std::ostream& operator<<(std::ostream& os, const bytes& b);

				std::ostream& operator<<(std::ostream& os, const bytes_opt& b);

				template <>

				struct fmt::formatter<fmt_hex> {

				    size_t _group_size_in_bytes = 0;

				    char _delimiter = ' ';

				public:

				    // format_spec := [group_size[delimeter]]

				    // group_size := a char from '0' to '9'

				    // delimeter := a char other than '{'  or '}'

				    //

				    // by default, the given bytes are printed without delimeter, just

				    // like a string. so a string view of {0x20, 0x01, 0x0d, 0xb8} is

				    // printed like:

				    // "20010db8".

				    //

				    // but the format specifier can be used to customize how the bytes

				    // are printed. for instance, to print an bytes_view like IPv6. so

				    // the format specfier would be "{:2:}", where

				    // - "2": bytes are printed in groups of 2 bytes

				    // - ":": each group is delimeted by ":"

				    // and the formatted output will look like:

				    // "2001:0db8:0000"

				    //

				    // or we can mimic how the default format of used by hexdump using

				    // "{:2 }", where

				    // - "2": bytes are printed in group of 2 bytes

				    // - " ": each group is delimeted by " "

				    // and the formatted output will look like:

				    // "2001 0db8 0000"

				    //

				    // or we can just print each bytes and separate them by a dash using

				    // "{:1-}"

				    // and the formatted output will look like:

				    // "20-01-0b-b8-00-00"

				    constexpr auto parse(fmt::format_parse_context& ctx) {

				        // get the delimeter if any

				        auto it = ctx.begin();

				        auto end = ctx.end();

				        if (it != end) {

				            int group_size = *it++ - '0';

				            if (group_size < 0 ||

				                static_cast<size_t>(group_size) > sizeof(uint64_t)) {

				                throw format_error("invalid group_size");

				            }

				            _group_size_in_bytes = group_size;

				            if (it != end) {

				                // optional delimiter

				                _delimiter = *it++;

				            }

				        }

				        if (it != end && *it != '}') {

				            throw format_error("invalid format");

				        }

				        return it;

				    }

				    template <typename FormatContext>

				    auto format(const ::fmt_hex& s, FormatContext& ctx) const {

				        auto out = ctx.out();

				        const auto& v = s.v;

				        if (_group_size_in_bytes > 0) {

				            for (size_t i = 0, size = v.size(); i < size; i++) {

				                if (i != 0 && i % _group_size_in_bytes == 0) {

				                    fmt::format_to(out, "{}{:02x}", _delimiter, std::byte(v[i]));

				                } else {

				                    fmt::format_to(out, "{:02x}", std::byte(v[i]));

				                }

				            }

				        } else {

				            for (auto b : v) {

				                fmt::format_to(out, "{:02x}", std::byte(b));

				            }

				        }

				        return out;

				    }

				};

				template <>

				struct fmt::formatter<bytes> : fmt::formatter<fmt_hex> {

				    template <typename FormatContext>

				    auto format(const ::bytes& s, FormatContext& ctx) const {

				        return fmt::formatter<::fmt_hex>::format(::fmt_hex(bytes_view(s)), ctx);

				    }

				};

				namespace std {

				// Must be in std:: namespace, or ADL fails

									
										6

bytes_ostream.hh
									
												View File
												
				@@ -12,7 +12,7 @@

				#include "bytes.hh"

				#include "utils/managed_bytes.hh"

				#include "hashing.hh"

				#include "utils/hashing.hh"

				#include <seastar/core/simple-stream.hh>

				#include <seastar/core/loop.hh>

				#include <bit>

				@@ -457,7 +457,9 @@ public:

				            _begin.ptr->size = _size;

				            _current = nullptr;

				            _size = 0;

				            return managed_bytes(std::exchange(_begin.ptr, {}));

				            auto begin_ptr = _begin.ptr;

				            _begin.ptr = nullptr;

				            return managed_bytes(begin_ptr);

				        } else {

				            return managed_bytes();

				        }

									
										489

cache_flat_mutation_reader.hh
									
												View File
												
				@@ -10,10 +10,10 @@

				#include <vector>

				#include "row_cache.hh"

				#include "mutation_fragment.hh"

				#include "mutation/mutation_fragment.hh"

				#include "query-request.hh"

				#include "partition_snapshot_row_cursor.hh"

				#include "range_tombstone_assembler.hh"

				#include "mutation/range_tombstone_assembler.hh"

				#include "read_context.hh"

				#include "readers/delegating_v2.hh"

				#include "clustering_key_filter.hh"

				@@ -41,7 +41,7 @@ class cache_flat_mutation_reader final : public flat_mutation_reader_v2::impl {

				        move_to_underlying,

				        // Invariants:

				        // - Upper bound of the read is min(_next_row.position(), _upper_bound)

				        // - Upper bound of the read is *_underlying_upper_bound

				        // - _next_row_in_range = _next.position() < _upper_bound

				        // - _last_row points at a direct predecessor of the next row which is going to be read.

				        //   Used for populating continuity.

				@@ -51,46 +51,6 @@ class cache_flat_mutation_reader final : public flat_mutation_reader_v2::impl {

				        end_of_stream

				    };

				    enum class source {

				        cache = 0,

				        underlying = 1,

				    };

				    // Merges range tombstone change streams coming from underlying and the cache.

				    // Ensures no range tombstone change fragment is emitted when there is no

				    // actual change in the effective tombstone.

				    class range_tombstone_change_merger {

				        const schema& _schema;

				        position_in_partition _pos;

				        tombstone _current_tombstone;

				        std::array<tombstone, 2> _tombstones;

				    private:

				        std::optional<range_tombstone_change> do_flush(position_in_partition pos, bool end_of_range) {

				            std::optional<range_tombstone_change> ret;

				            position_in_partition::tri_compare cmp(_schema);

				            const auto res = cmp(_pos, pos);

				            const auto should_flush = end_of_range ? res <= 0 : res < 0;

				            if (should_flush) {

				                auto merged_tomb = std::max(_tombstones.front(), _tombstones.back());

				                if (merged_tomb != _current_tombstone) {

				                    _current_tombstone = merged_tomb;

				                    ret.emplace(_pos, _current_tombstone);

				                }

				                _pos = std::move(pos);

				            }

				            return ret;

				        }

				    public:

				        range_tombstone_change_merger(const schema& s) : _schema(s), _pos(position_in_partition::before_all_clustered_rows()), _tombstones{}

				        { }

				        std::optional<range_tombstone_change> apply(source src, range_tombstone_change&& rtc) {

				            auto ret = do_flush(rtc.position(), false);

				            _tombstones[static_cast<size_t>(src)] = rtc.tombstone();

				            return ret;

				        }

				        std::optional<range_tombstone_change> flush(position_in_partition_view pos, bool end_of_range) {

				            return do_flush(position_in_partition(pos), end_of_range);

				        }

				    };

				    partition_snapshot_ptr _snp;

				    query::clustering_key_filter_ranges _ck_ranges; // Query schema domain, reversed reads use native order

				@@ -103,8 +63,11 @@ class cache_flat_mutation_reader final : public flat_mutation_reader_v2::impl {

				    // Holds the lower bound of a position range which hasn't been processed yet.

				    // Only rows with positions < _lower_bound have been emitted, and only

				    // range_tombstones with positions <= _lower_bound.

				    // range_tombstone_changes with positions <= _lower_bound.

				    //

				    // Invariant: !_lower_bound.is_clustering_row()

				    position_in_partition _lower_bound; // Query schema domain

				    // Invariant: !_upper_bound.is_clustering_row()

				    position_in_partition_view _upper_bound; // Query schema domain

				    std::optional<position_in_partition> _underlying_upper_bound; // Query schema domain

				@@ -121,22 +84,19 @@ class cache_flat_mutation_reader final : public flat_mutation_reader_v2::impl {

				    read_context& _read_context;

				    partition_snapshot_row_cursor _next_row;

				    range_tombstone_change_generator _rt_gen; // cache -> reader

				    range_tombstone_assembler _rt_assembler; // underlying -> cache

				    range_tombstone_change_merger _rt_merger; // {cache, underlying} -> reader

				    // When the read moves to the underlying, the read range will be

				    // (_lower_bound, x], where x is either _next_row.position() or _upper_bound.

				    // In the former case (x is _next_row.position()), underlying can emit

				    // a range tombstone change for after_key(x), which is outside the range.

				    // We can't push this fragment into the buffer straight away, the cache may

				    // have fragments with smaller position. So we save it here and flush it when

				    // a fragment with a larger position is seen.

				    std::optional<mutation_fragment_v2> _queued_underlying_fragment;

				    // Holds the currently active range tombstone of the output mutation fragment stream.

				    // While producing the stream, at any given time, _current_tombstone applies to the

				    // key range which extends at least to _lower_bound. When consuming subsequent interval,

				    // which will advance _lower_bound further, be it from underlying or from cache,

				    // a decision is made whether the range tombstone in the next interval is the same as

				    // the current one or not. If it is different, then range_tombstone_change is emitted

				    // with the old _lower_bound value (start of the next interval).

				    tombstone _current_tombstone;

				    state _state = state::before_static_row;

				    bool _next_row_in_range = false;

				    bool _has_rt = false;

				    // True iff current population interval, since the previous clustering row, starts before all clustered rows.

				    // We cannot just look at _lower_bound, because emission of range tombstones changes _lower_bound and

				@@ -145,11 +105,6 @@ class cache_flat_mutation_reader final : public flat_mutation_reader_v2::impl {

				    // Valid when _state == reading_from_underlying.

				    bool _population_range_starts_before_all_rows;

				    // Whether _lower_bound was changed within current fill_buffer().

				    // If it did not then we cannot break out of it (e.g. on preemption) because

				    // forward progress is not guaranteed in case iterators are getting constantly invalidated.

				    bool _lower_bound_changed = false;

				    // Points to the underlying reader conforming to _schema,

				    // either to *_underlying_holder or _read_context.underlying().underlying().

				    flat_mutation_reader_v2* _underlying = nullptr;

				@@ -163,14 +118,11 @@ class cache_flat_mutation_reader final : public flat_mutation_reader_v2::impl {

				    void move_to_next_range();

				    void move_to_range(query::clustering_row_ranges::const_iterator);

				    void move_to_next_entry();

				    void maybe_drop_last_entry() noexcept;

				    void flush_tombstones(position_in_partition_view, bool end_of_range = false);

				    void maybe_drop_last_entry(tombstone) noexcept;

				    void add_to_buffer(const partition_snapshot_row_cursor&);

				    void add_clustering_row_to_buffer(mutation_fragment_v2&&);

				    void add_to_buffer(range_tombstone_change&&, source);

				    void do_add_to_buffer(range_tombstone_change&&);

				    void add_range_tombstone_to_buffer(range_tombstone&&);

				    void add_to_buffer(mutation_fragment_v2&&);

				    void add_to_buffer(range_tombstone_change&&);

				    void offer_from_underlying(mutation_fragment_v2&&);

				    future<> read_from_underlying();

				    void start_reading_from_underlying();

				    bool after_current_range(position_in_partition_view position);

				@@ -189,7 +141,7 @@ class cache_flat_mutation_reader final : public flat_mutation_reader_v2::impl {

				    bool ensure_population_lower_bound();

				    void maybe_add_to_cache(const mutation_fragment_v2& mf);

				    void maybe_add_to_cache(const clustering_row& cr);

				    void maybe_add_to_cache(const range_tombstone_change& rtc);

				    bool maybe_add_to_cache(const range_tombstone_change& rtc);

				    void maybe_add_to_cache(const static_row& sr);

				    void maybe_set_static_row_continuous();

				    void finish_reader() {

				@@ -244,8 +196,6 @@ public:

				        , _read_context_holder()

				        , _read_context(ctx)    // ctx is owned by the caller, who's responsible for closing it.

				        , _next_row(*_schema, *_snp, false, _read_context.is_reversed())

				        , _rt_gen(*_schema)

				        , _rt_merger(*_schema)

				    {

				        clogger.trace("csm {}: table={}.{}, reversed={}, snap={}", fmt::ptr(this), _schema->ks_name(), _schema->cf_name(), _read_context.is_reversed(),

				                      fmt::ptr(&*_snp));

				@@ -373,13 +323,31 @@ future<> cache_flat_mutation_reader::do_fill_buffer() {

				        }

				        _state = state::reading_from_underlying;

				        _population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema) && !_read_context.is_reversed();

				        _underlying_upper_bound = _next_row_in_range ? position_in_partition::before_key(_next_row.position())

				                                                     : position_in_partition(_upper_bound);

				        if (!_read_context.partition_exists()) {

				            clogger.trace("csm {}: partition does not exist", fmt::ptr(this));

				            if (_current_tombstone) {

				                clogger.trace("csm {}: move_to_underlying: emit rtc({}, null)", fmt::ptr(this), _lower_bound);

				                push_mutation_fragment(mutation_fragment_v2(*_schema, _permit, range_tombstone_change(_lower_bound, {})));

				                _current_tombstone = {};

				            }

				            return read_from_underlying();

				        }

				        _underlying_upper_bound = _next_row_in_range ? position_in_partition(_next_row.position())

				                                      : position_in_partition(_upper_bound);

				        return _underlying->fast_forward_to(position_range{_lower_bound, *_underlying_upper_bound}).then([this] {

				            return read_from_underlying();

				            if (!_current_tombstone) {

				                return read_from_underlying();

				            }

				            return _underlying->peek().then([this] (mutation_fragment_v2* mf) {

				                position_in_partition::equal_compare eq(*_schema);

				                if (!mf || !mf->is_range_tombstone_change()

				                        || !eq(mf->as_range_tombstone_change().position(), _lower_bound)) {

				                    clogger.trace("csm {}: move_to_underlying: emit rtc({}, null)", fmt::ptr(this), _lower_bound);

				                    push_mutation_fragment(mutation_fragment_v2(*_schema, _permit, range_tombstone_change(_lower_bound, {})));

				                    _current_tombstone = {};

				                }

				                return read_from_underlying();

				            });

				        });

				    }

				    if (_state == state::reading_from_underlying) {

				@@ -388,8 +356,8 @@ future<> cache_flat_mutation_reader::do_fill_buffer() {

				    // assert(_state == state::reading_from_cache)

				    return _lsa_manager.run_in_read_section([this] {

				        auto next_valid = _next_row.iterators_valid();

				        clogger.trace("csm {}: reading_from_cache, range=[{}, {}), next={}, valid={}", fmt::ptr(this), _lower_bound,

				            _upper_bound, _next_row.position(), next_valid);

				        clogger.trace("csm {}: reading_from_cache, range=[{}, {}), next={}, valid={}, rt={}", fmt::ptr(this), _lower_bound,

				            _upper_bound, _next_row.position(), next_valid, _current_tombstone);

				        // We assume that if there was eviction, and thus the range may

				        // no longer be continuous, the cursor was invalidated.

				        if (!next_valid) {

				@@ -403,13 +371,9 @@ future<> cache_flat_mutation_reader::do_fill_buffer() {

				        }

				        _next_row.maybe_refresh();

				        clogger.trace("csm {}: next={}", fmt::ptr(this), _next_row);

				        _lower_bound_changed = false;

				        while (_state == state::reading_from_cache) {

				            copy_from_cache_to_buffer();

				            // We need to check _lower_bound_changed even if is_buffer_full() because

				            // we may have emitted only a range tombstone which overlapped with _lower_bound

				            // and thus didn't cause _lower_bound to change.

				            if ((need_preempt() || is_buffer_full()) && _lower_bound_changed) {

				            if (need_preempt() || is_buffer_full()) {

				                break;

				            }

				        }

				@@ -423,37 +387,38 @@ future<> cache_flat_mutation_reader::read_from_underlying() {

				        [this] { return _state != state::reading_from_underlying || is_buffer_full(); },

				        [this] (mutation_fragment_v2 mf) {

				            _read_context.cache().on_row_miss();

				            maybe_add_to_cache(mf);

				            add_to_buffer(std::move(mf));

				            offer_from_underlying(std::move(mf));

				        },

				        [this] {

				            _lower_bound = std::move(*_underlying_upper_bound);

				            _underlying_upper_bound.reset();

				            _state = state::reading_from_cache;

				            _lsa_manager.run_in_update_section([this] {

				                auto same_pos = _next_row.maybe_refresh();

				                clogger.trace("csm {}: underlying done, in_range={}, same={}, next={}", fmt::ptr(this), _next_row_in_range, same_pos, _next_row);

				                if (!same_pos) {

				                    _read_context.cache().on_mispopulate(); // FIXME: Insert dummy entry at _upper_bound.

				                    _read_context.cache().on_mispopulate(); // FIXME: Insert dummy entry at _lower_bound.

				                    _next_row_in_range = !after_current_range(_next_row.position());

				                    if (!_next_row.continuous()) {

				                        _last_row = nullptr; // We did not populate the full range up to _lower_bound, break continuity

				                        start_reading_from_underlying();

				                    }

				                    return;

				                }

				                if (_next_row_in_range) {

				                    maybe_update_continuity();

				                    if (!_next_row.dummy()) {

				                        _lower_bound = position_in_partition::before_key(_next_row.key());

				                    } else {

				                        _lower_bound = _next_row.position();

				                    }

				                } else {

				                    if (no_clustering_row_between(*_schema, _upper_bound, _next_row.position())) {

				                        this->maybe_update_continuity();

				                    } else if (can_populate()) {

				                    if (can_populate()) {

				                        const schema& table_s = table_schema();

				                        rows_entry::tri_compare cmp(table_s);

				                        auto& rows = _snp->version()->partition().mutable_clustered_rows();

				                        if (query::is_single_row(*_schema, *_ck_ranges_curr)) {

				                            // If there are range tombstones which apply to the row then

				                            // we cannot insert an empty entry here because if those range

				                            // tombstones got evicted by now, we will insert an entry

				                            // with missing range tombstone information.

				                            // FIXME: try to set the range tombstone when possible.

				                            if (!_has_rt) {

				                            with_allocator(_snp->region().allocator(), [&] {

				                                auto e = alloc_strategy_unique_ptr<rows_entry>(

				                                    current_allocator().construct<rows_entry>(_ck_ranges_curr->start()->value()));

				@@ -466,9 +431,10 @@ future<> cache_flat_mutation_reader::read_from_underlying() {

				                                    // Also works in reverse read mode.

				                                    // It preserves the continuity of the range the entry falls into.

				                                    it->set_continuous(next->continuous());

				                                    clogger.trace("csm {}: inserted empty row at {}, cont={}", fmt::ptr(this), it->position(), it->continuous());

				                                    clogger.trace("csm {}: inserted empty row at {}, cont={}, rt={}", fmt::ptr(this), it->position(), it->continuous(), it->range_tombstone());

				                                }

				                            });

				                            }

				                        } else if (ensure_population_lower_bound()) {

				                            with_allocator(_snp->region().allocator(), [&] {

				                                auto e = alloc_strategy_unique_ptr<rows_entry>(

				@@ -476,17 +442,19 @@ future<> cache_flat_mutation_reader::read_from_underlying() {

				                                // Use _next_row iterator only as a hint, because there could be insertions after _upper_bound.

				                                auto insert_result = rows.insert_before_hint(_next_row.get_iterator_in_latest_version(), std::move(e), cmp);

				                                if (insert_result.second) {

				                                    clogger.trace("csm {}: inserted dummy at {}", fmt::ptr(this), _upper_bound);

				                                    clogger.trace("csm {}: L{}: inserted dummy at {}", fmt::ptr(this), __LINE__, _upper_bound);

				                                    _snp->tracker()->insert(*insert_result.first);

				                                }

				                                if (_read_context.is_reversed()) [[unlikely]] {

				                                    clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), _last_row.position());

				                                    clogger.trace("csm {}: set_continuous({}), prev={}, rt={}", fmt::ptr(this), _last_row.position(), insert_result.first->position(), _current_tombstone);

				                                    _last_row->set_continuous(true);

				                                    _last_row->set_range_tombstone(_current_tombstone);

				                                } else {

				                                    clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), insert_result.first->position());

				                                    clogger.trace("csm {}: set_continuous({}), prev={}, rt={}", fmt::ptr(this), insert_result.first->position(), _last_row.position(), _current_tombstone);

				                                    insert_result.first->set_continuous(true);

				                                    insert_result.first->set_range_tombstone(_current_tombstone);

				                                }

				                                maybe_drop_last_entry();

				                                maybe_drop_last_entry(_current_tombstone);

				                            });

				                        }

				                    } else {

				@@ -515,55 +483,103 @@ bool cache_flat_mutation_reader::ensure_population_lower_bound() {

				    // Continuity flag we will later set for the upper bound extends to the previous row in the same version,

				    // so we need to ensure we have an entry in the latest version.

				    if (!_last_row.is_in_latest_version()) {

				        with_allocator(_snp->region().allocator(), [&] {

				            auto& rows = _snp->version()->partition().mutable_clustered_rows();

				            rows_entry::tri_compare cmp(table_schema());

				            // FIXME: Avoid the copy by inserting an incomplete clustering row

				            auto e = alloc_strategy_unique_ptr<rows_entry>(

				                current_allocator().construct<rows_entry>(table_schema(), *_last_row));

				            e->set_continuous(false);

				            auto insert_result = rows.insert_before_hint(rows.end(), std::move(e), cmp);

				            if (insert_result.second) {

				                auto it = insert_result.first;

				                clogger.trace("csm {}: inserted lower bound dummy at {}", fmt::ptr(this), it->position());

				                _snp->tracker()->insert(*it);

				            }

				            _last_row.set_latest(insert_result.first);

				        rows_entry::tri_compare cmp(*_schema);

				        partition_snapshot_row_cursor cur(*_schema, *_snp, false, _read_context.is_reversed());

				        if (!cur.advance_to(_last_row.position())) {

				            return false;

				        }

				        if (cmp(cur.position(), _last_row.position()) != 0) {

				            return false;

				        }

				        auto res = with_allocator(_snp->region().allocator(), [&] {

				            return cur.ensure_entry_in_latest();

				        });

				        _last_row.set_latest(res.it);

				        if (res.inserted) {

				            clogger.trace("csm {}: inserted lower bound dummy at {}", fmt::ptr(this), _last_row.position());

				        }

				    }

				    return true;

				}

				inline

				void cache_flat_mutation_reader::maybe_update_continuity() {

				    if (can_populate() && ensure_population_lower_bound()) {

				    position_in_partition::equal_compare eq(*_schema);

				    if (can_populate()

				            && ensure_population_lower_bound()

				            && !eq(_last_row.position(), _next_row.position())) {

				        with_allocator(_snp->region().allocator(), [&] {

				            rows_entry& e = _next_row.ensure_entry_in_latest().row;

				            auto& rows = _snp->version()->partition().mutable_clustered_rows();

				            const schema& table_s = table_schema();

				            rows_entry::tri_compare table_cmp(table_s);

				            if (_read_context.is_reversed()) [[unlikely]] {

				                clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), _last_row.position());

				                _last_row->set_continuous(true);

				                if (_current_tombstone != _last_row->range_tombstone() && !_last_row->dummy()) {

				                    with_allocator(_snp->region().allocator(), [&] {

				                        auto e2 = alloc_strategy_unique_ptr<rows_entry>(

				                                current_allocator().construct<rows_entry>(table_s,

				                                                                          position_in_partition_view::before_key(_last_row->position()),

				                                                                          is_dummy::yes,

				                                                                          is_continuous::yes));

				                        auto insert_result = rows.insert(std::move(e2), table_cmp);

				                        if (insert_result.second) {

				                            clogger.trace("csm {}: L{}: inserted dummy at {}", fmt::ptr(this), __LINE__, insert_result.first->position());

				                            _snp->tracker()->insert(*insert_result.first);

				                        }

				                        clogger.trace("csm {}: set_continuous({}), prev={}, rt={}", fmt::ptr(this), insert_result.first->position(),

				                                      _last_row.position(), _current_tombstone);

				                        insert_result.first->set_continuous(true);

				                        insert_result.first->set_range_tombstone(_current_tombstone);

				                        clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), _last_row.position());

				                        _last_row->set_continuous(true);

				                    });

				                } else {

				                    clogger.trace("csm {}: set_continuous({}), rt={}", fmt::ptr(this), _last_row.position(), _current_tombstone);

				                    _last_row->set_continuous(true);

				                    _last_row->set_range_tombstone(_current_tombstone);

				                }

				            } else {

				                clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), e.position());

				                e.set_continuous(true);

				                if (_current_tombstone != e.range_tombstone() && !e.dummy()) {

				                    with_allocator(_snp->region().allocator(), [&] {

				                        auto e2 = alloc_strategy_unique_ptr<rows_entry>(

				                                current_allocator().construct<rows_entry>(table_s,

				                                                                          position_in_partition_view::before_key(e.position()),

				                                                                          is_dummy::yes,

				                                                                          is_continuous::yes));

				                        // Use _next_row iterator only as a hint because there could be insertions before

				                        // _next_row.get_iterator_in_latest_version(), either from concurrent reads,

				                        // from _next_row.ensure_entry_in_latest().

				                        auto insert_result = rows.insert_before_hint(_next_row.get_iterator_in_latest_version(), std::move(e2), table_cmp);

				                        if (insert_result.second) {

				                            clogger.trace("csm {}: L{}: inserted dummy at {}", fmt::ptr(this), __LINE__, insert_result.first->position());

				                            _snp->tracker()->insert(*insert_result.first);

				                        }

				                        clogger.trace("csm {}: set_continuous({}), prev={}, rt={}", fmt::ptr(this), insert_result.first->position(),

				                                      _last_row.position(), _current_tombstone);

				                        insert_result.first->set_continuous(true);

				                        insert_result.first->set_range_tombstone(_current_tombstone);

				                        clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), e.position());

				                        e.set_continuous(true);

				                    });

				                } else {

				                    clogger.trace("csm {}: set_continuous({}), rt={}", fmt::ptr(this), e.position(), _current_tombstone);

				                    e.set_range_tombstone(_current_tombstone);

				                    e.set_continuous(true);

				                }

				            }

				            maybe_drop_last_entry();

				            maybe_drop_last_entry(_current_tombstone);

				        });

				    } else {

				        _read_context.cache().on_mispopulate();

				    }

				}

				inline

				void cache_flat_mutation_reader::maybe_add_to_cache(const mutation_fragment_v2& mf) {

				    if (mf.is_range_tombstone_change()) {

				        maybe_add_to_cache(mf.as_range_tombstone_change());

				    } else {

				        assert(mf.is_clustering_row());

				        const clustering_row& cr = mf.as_clustering_row();

				        maybe_add_to_cache(cr);

				    }

				}

				inline

				void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {

				    if (!can_populate()) {

				@@ -572,16 +588,9 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {

				        _read_context.cache().on_mispopulate();

				        return;

				    }

				    auto rt_opt = _rt_assembler.flush(*_schema, position_in_partition::after_key(cr.key()));

				    clogger.trace("csm {}: populate({})", fmt::ptr(this), clustering_row::printer(*_schema, cr));

				    _lsa_manager.run_in_update_section_with_allocator([this, &cr, &rt_opt] {

				        mutation_partition& mp = _snp->version()->partition();

				        if (rt_opt) {

				            clogger.trace("csm {}: populate flushed rt({})", fmt::ptr(this), *rt_opt);

				            mp.mutable_row_tombstones().apply_monotonically(table_schema(), to_table_domain(range_tombstone(*rt_opt)));

				        }

				    clogger.trace("csm {}: populate({}), rt={}", fmt::ptr(this), clustering_row::printer(*_schema, cr), _current_tombstone);

				    _lsa_manager.run_in_update_section_with_allocator([this, &cr] {

				        mutation_partition_v2& mp = _snp->version()->partition();

				        rows_entry::tri_compare cmp(table_schema());

				        if (_read_context.digest_requested()) {

				@@ -590,6 +599,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {

				        auto new_entry = alloc_strategy_unique_ptr<rows_entry>(

				            current_allocator().construct<rows_entry>(table_schema(), cr.key(), cr.as_deletable_row()));

				        new_entry->set_continuous(false);

				        new_entry->set_range_tombstone(_current_tombstone);

				        auto it = _next_row.iterators_valid() ? _next_row.get_iterator_in_latest_version()

				                                              : mp.clustered_rows().lower_bound(cr.key(), cmp);

				        auto insert_result = mp.mutable_clustered_rows().insert_before_hint(it, std::move(new_entry), cmp);

				@@ -603,9 +613,14 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {

				            if (_read_context.is_reversed()) [[unlikely]] {

				                clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), _last_row.position());

				                _last_row->set_continuous(true);

				                // _current_tombstone must also apply to _last_row itself (if it's non-dummy)

				                // because otherwise there would be a rtc after it, either creating a different entry,

				                // or clearing _last_row if population did not happen.

				                _last_row->set_range_tombstone(_current_tombstone);

				            } else {

				                clogger.trace("csm {}: set_continuous({})", fmt::ptr(this), e.position());

				                e.set_continuous(true);

				                e.set_range_tombstone(_current_tombstone);

				            }

				        } else {

				            _read_context.cache().on_mispopulate();

				@@ -617,6 +632,72 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {

				    });

				}

				inline

				bool cache_flat_mutation_reader::maybe_add_to_cache(const range_tombstone_change& rtc) {

				    rows_entry::tri_compare q_cmp(*_schema);

				    clogger.trace("csm {}: maybe_add_to_cache({})", fmt::ptr(this), rtc);

				    // Don't emit the closing range tombstone change, we may continue from cache with the same tombstone.

				    // The following relies on !_underlying_upper_bound->is_clustering_row()

				    if (q_cmp(rtc.position(), *_underlying_upper_bound) == 0) {

				        _lower_bound = rtc.position();

				        return false;

				    }

				    auto prev = std::exchange(_current_tombstone, rtc.tombstone());

				    if (_current_tombstone == prev) {

				        return false;

				    }

				    if (!can_populate()) {

				        // _current_tombstone is now invalid and remains so for this reader. No need to change it.

				        _last_row = nullptr;

				        _population_range_starts_before_all_rows = false;

				        _read_context.cache().on_mispopulate();

				        return true;

				    }

				    _lsa_manager.run_in_update_section_with_allocator([&] {

				        mutation_partition_v2& mp = _snp->version()->partition();

				        rows_entry::tri_compare cmp(table_schema());

				        auto new_entry = alloc_strategy_unique_ptr<rows_entry>(

				                current_allocator().construct<rows_entry>(table_schema(), to_table_domain(rtc.position()), is_dummy::yes, is_continuous::no));

				        auto it = _next_row.iterators_valid() ? _next_row.get_iterator_in_latest_version()

				                                              : mp.clustered_rows().lower_bound(to_table_domain(rtc.position()), cmp);

				        auto insert_result = mp.mutable_clustered_rows().insert_before_hint(it, std::move(new_entry), cmp);

				        it = insert_result.first;

				        if (insert_result.second) {

				            _snp->tracker()->insert(*it);

				        }

				        rows_entry& e = *it;

				        if (ensure_population_lower_bound()) {

				            // underlying may emit range_tombstone_change fragments with the same position.

				            // In such case, the range to which the tombstone from the first fragment applies is empty and should be ignored.

				            if (q_cmp(_last_row.position(), it->position()) < 0) {

				                if (_read_context.is_reversed()) [[unlikely]] {

				                    clogger.trace("csm {}: set_continuous({}), rt={}", fmt::ptr(this), _last_row.position(), prev);

				                    _last_row->set_continuous(true);

				                    _last_row->set_range_tombstone(prev);

				                } else {

				                    clogger.trace("csm {}: set_continuous({}), rt={}", fmt::ptr(this), e.position(), prev);

				                    e.set_continuous(true);

				                    e.set_range_tombstone(prev);

				                }

				            }

				        } else {

				            _read_context.cache().on_mispopulate();

				        }

				        with_allocator(standard_allocator(), [&] {

				            _last_row = partition_snapshot_row_weakref(*_snp, it, true);

				        });

				        _population_range_starts_before_all_rows = false;

				    });

				    return true;

				}

				inline

				bool cache_flat_mutation_reader::after_current_range(position_in_partition_view p) {

				    position_in_partition::tri_compare cmp(*_schema);

				@@ -632,19 +713,35 @@ void cache_flat_mutation_reader::start_reading_from_underlying() {

				inline

				void cache_flat_mutation_reader::copy_from_cache_to_buffer() {

				    clogger.trace("csm {}: copy_from_cache, next={}, next_row_in_range={}", fmt::ptr(this), _next_row.position(), _next_row_in_range);

				    clogger.trace("csm {}: copy_from_cache, next_row_in_range={}, next={}", fmt::ptr(this), _next_row_in_range, _next_row);

				    _next_row.touch();

				    position_in_partition_view next_lower_bound = _next_row.dummy() ? _next_row.position() : position_in_partition_view::after_key(_next_row.key());

				    auto upper_bound = _next_row_in_range ? next_lower_bound : _upper_bound;

				    if (_snp->range_tombstones(_lower_bound, upper_bound, [&] (range_tombstone rts) {

				        add_range_tombstone_to_buffer(std::move(rts));

				        return stop_iteration(_lower_bound_changed && is_buffer_full());

				    }, _read_context.is_reversed()) == stop_iteration::no) {

				        return;

				    if (_next_row.range_tombstone() != _current_tombstone) {

				        position_in_partition::equal_compare eq(*_schema);

				        auto upper_bound = _next_row_in_range ? position_in_partition_view::before_key(_next_row.position()) : _upper_bound;

				        if (!eq(_lower_bound, upper_bound)) {

				            position_in_partition new_lower_bound(upper_bound);

				            auto tomb = _next_row.range_tombstone();

				            clogger.trace("csm {}: rtc({}, {}) ...{}", fmt::ptr(this), _lower_bound, tomb, new_lower_bound);

				            push_mutation_fragment(mutation_fragment_v2(*_schema, _permit, range_tombstone_change(_lower_bound, tomb)));

				            _current_tombstone = tomb;

				            _lower_bound = std::move(new_lower_bound);

				            _read_context.cache()._tracker.on_range_tombstone_read();

				        }

				    }

				    // We add the row to the buffer even when it's full.

				    // This simplifies the code. For more info see #3139.

				    if (_next_row_in_range) {

				        if (_next_row.range_tombstone_for_row() != _current_tombstone) [[unlikely]] {

				            auto tomb = _next_row.range_tombstone_for_row();

				            auto new_lower_bound = position_in_partition::before_key(_next_row.position());

				            clogger.trace("csm {}: rtc({}, {})", fmt::ptr(this), new_lower_bound, tomb);

				            push_mutation_fragment(mutation_fragment_v2(*_schema, _permit, range_tombstone_change(new_lower_bound, tomb)));

				            _lower_bound = std::move(new_lower_bound);

				            _current_tombstone = tomb;

				            _read_context.cache()._tracker.on_range_tombstone_read();

				        }

				        add_to_buffer(_next_row);

				        move_to_next_entry();

				    } else {

				@@ -660,10 +757,11 @@ void cache_flat_mutation_reader::move_to_end() {

				inline

				void cache_flat_mutation_reader::move_to_next_range() {

				    if (_queued_underlying_fragment) {

				        add_to_buffer(*std::exchange(_queued_underlying_fragment, {}));

				    if (_current_tombstone) {

				        clogger.trace("csm {}: move_to_next_range: emit rtc({}, null)", fmt::ptr(this), _upper_bound);

				        push_mutation_fragment(mutation_fragment_v2(*_schema, _permit, range_tombstone_change(_upper_bound, {})));

				        _current_tombstone = {};

				    }

				    flush_tombstones(position_in_partition::for_range_end(*_ck_ranges_curr), true);

				    auto next_it = std::next(_ck_ranges_curr);

				    if (next_it == _ck_ranges_end) {

				        move_to_end();

				@@ -680,8 +778,6 @@ void cache_flat_mutation_reader::move_to_range(query::clustering_row_ranges::con

				    _last_row = nullptr;

				    _lower_bound = std::move(lb);

				    _upper_bound = std::move(ub);

				    _rt_gen.trim(_lower_bound);

				    _lower_bound_changed = true;

				    _ck_ranges_curr = next_it;

				    auto adjacent = _next_row.advance_to(_lower_bound);

				    _next_row_in_range = !after_current_range(_next_row.position());

				@@ -722,7 +818,7 @@ void cache_flat_mutation_reader::move_to_range(query::clustering_row_ranges::con

				// _next_row must have a greater position than _last_row.

				// Invalidates references but keeps the _next_row valid.

				inline

				void cache_flat_mutation_reader::maybe_drop_last_entry() noexcept {

				void cache_flat_mutation_reader::maybe_drop_last_entry(tombstone rt) noexcept {

				    // Drop dummy entry if it falls inside a continuous range.

				    // This prevents unnecessary dummy entries from accumulating in cache and slowing down scans.

				    //

				@@ -733,11 +829,16 @@ void cache_flat_mutation_reader::maybe_drop_last_entry() noexcept {

				            && !_read_context.is_reversed() // FIXME

				            && _last_row->dummy()

				            && _last_row->continuous()

				            && _last_row->range_tombstone() == rt

				            && _snp->at_latest_version()

				            && _snp->at_oldest_version()) {

				        clogger.trace("csm {}: dropping unnecessary dummy at {}", fmt::ptr(this), _last_row->position());

				        with_allocator(_snp->region().allocator(), [&] {

				            _last_row->on_evicted(_read_context.cache()._tracker);

				            cache_tracker& tracker = _read_context.cache()._tracker;

				            tracker.get_lru().remove(*_last_row);

				            _last_row->on_evicted(tracker);

				        });

				        _last_row = nullptr;

				@@ -767,57 +868,38 @@ void cache_flat_mutation_reader::move_to_next_entry() {

				        if (!_next_row.continuous()) {

				            start_reading_from_underlying();

				        } else {

				            maybe_drop_last_entry();

				            maybe_drop_last_entry(_next_row.range_tombstone());

				        }

				    }

				}

				void cache_flat_mutation_reader::flush_tombstones(position_in_partition_view pos, bool end_of_range) {

				    // Ensure position is appropriate for range tombstone bound

				    pos = position_in_partition_view::after_key(pos);

				    clogger.trace("csm {}: flush_tombstones({}) end_of_range: {}", fmt::ptr(this), pos, end_of_range);

				    _rt_gen.flush(pos, [this] (range_tombstone_change&& rtc) {

				        add_to_buffer(std::move(rtc), source::cache);

				    }, end_of_range);

				    if (auto rtc_opt = _rt_merger.flush(pos, end_of_range)) {

				        do_add_to_buffer(std::move(*rtc_opt));

				    }

				}

				inline

				void cache_flat_mutation_reader::add_to_buffer(mutation_fragment_v2&& mf) {

				    clogger.trace("csm {}: add_to_buffer({})", fmt::ptr(this), mutation_fragment_v2::printer(*_schema, mf));

				    position_in_partition::less_compare less(*_schema);

				    if (_underlying_upper_bound && less(*_underlying_upper_bound, mf.position())) {

				        _queued_underlying_fragment = std::move(mf);

				        return;

				    }

				    flush_tombstones(mf.position());

				void cache_flat_mutation_reader::offer_from_underlying(mutation_fragment_v2&& mf) {

				    clogger.trace("csm {}: offer_from_underlying({})", fmt::ptr(this), mutation_fragment_v2::printer(*_schema, mf));

				    if (mf.is_clustering_row()) {

				        maybe_add_to_cache(mf.as_clustering_row());

				        add_clustering_row_to_buffer(std::move(mf));

				    } else {

				        assert(mf.is_range_tombstone_change());

				        add_to_buffer(std::move(mf).as_range_tombstone_change(), source::underlying);

				        auto& chg = mf.as_range_tombstone_change();

				        if (maybe_add_to_cache(chg)) {

				            add_to_buffer(std::move(mf).as_range_tombstone_change());

				        }

				    }

				}

				inline

				void cache_flat_mutation_reader::add_to_buffer(const partition_snapshot_row_cursor& row) {

				    position_in_partition::less_compare less(*_schema);

				    if (_queued_underlying_fragment && less(_queued_underlying_fragment->position(), row.position())) {

				        add_to_buffer(*std::exchange(_queued_underlying_fragment, {}));

				    }

				    if (!row.dummy()) {

				        _read_context.cache().on_row_hit();

				        if (_read_context.digest_requested()) {

				            row.latest_row().cells().prepare_hash(table_schema(), column_kind::regular_column);

				        }

				        flush_tombstones(position_in_partition_view::for_key(row.key()));

				        add_clustering_row_to_buffer(mutation_fragment_v2(*_schema, _permit, row.row()));

				    } else {

				        if (less(_lower_bound, row.position())) {

				            _lower_bound = row.position();

				            _lower_bound_changed = true;

				        }

				        _read_context.cache()._tracker.on_dummy_row_hit();

				    }

				@@ -830,67 +912,24 @@ inline

				void cache_flat_mutation_reader::add_clustering_row_to_buffer(mutation_fragment_v2&& mf) {

				    clogger.trace("csm {}: add_clustering_row_to_buffer({})", fmt::ptr(this), mutation_fragment_v2::printer(*_schema, mf));

				    auto& row = mf.as_clustering_row();

				    auto new_lower_bound = position_in_partition::after_key(row.key());

				    auto new_lower_bound = position_in_partition::after_key(*_schema, row.key());

				    push_mutation_fragment(std::move(mf));

				    _lower_bound = std::move(new_lower_bound);

				    _lower_bound_changed = true;

				    if (row.tomb()) {

				        _read_context.cache()._tracker.on_row_tombstone_read();

				    }

				}

				inline

				void cache_flat_mutation_reader::add_to_buffer(range_tombstone_change&& rtc, source src) {

				void cache_flat_mutation_reader::add_to_buffer(range_tombstone_change&& rtc) {

				    clogger.trace("csm {}: add_to_buffer({})", fmt::ptr(this), rtc);

				    if (auto rtc_opt = _rt_merger.apply(src, std::move(rtc))) {

				        do_add_to_buffer(std::move(*rtc_opt));

				    }

				}

				inline

				void cache_flat_mutation_reader::do_add_to_buffer(range_tombstone_change&& rtc) {

				    clogger.trace("csm {}: push({})", fmt::ptr(this), rtc);

				    _has_rt = true;

				    position_in_partition::less_compare less(*_schema);

				    auto lower_bound_changed = less(_lower_bound, rtc.position());

				    _lower_bound = position_in_partition(rtc.position());

				    _lower_bound_changed = lower_bound_changed;

				    push_mutation_fragment(*_schema, _permit, std::move(rtc));

				    _read_context.cache()._tracker.on_range_tombstone_read();

				}

				inline

				void cache_flat_mutation_reader::add_range_tombstone_to_buffer(range_tombstone&& rt) {

				    position_in_partition::less_compare less(*_schema);

				    if (_queued_underlying_fragment && less(_queued_underlying_fragment->position(), rt.position())) {

				        add_to_buffer(*std::exchange(_queued_underlying_fragment, {}));

				    }

				    clogger.trace("csm {}: add_to_buffer({})", fmt::ptr(this), rt);

				    if (!less(_lower_bound, rt.position())) {

				        rt.set_start(_lower_bound);

				    }

				    flush_tombstones(rt.position());

				    _rt_gen.consume(std::move(rt));

				}

				inline

				void cache_flat_mutation_reader::maybe_add_to_cache(const range_tombstone_change& rtc) {

				    clogger.trace("csm {}: maybe_add_to_cache({})", fmt::ptr(this), rtc);

				    auto rt_opt = _rt_assembler.consume(*_schema, range_tombstone_change(rtc));

				    if (!rt_opt) {

				        return;

				    }

				    const auto& rt = *rt_opt;

				    if (can_populate()) {

				        clogger.trace("csm {}: maybe_add_to_cache({})", fmt::ptr(this), rt);

				        _lsa_manager.run_in_update_section_with_allocator([&] {

				            _snp->version()->partition().mutable_row_tombstones().apply_monotonically(

				                    table_schema(), to_table_domain(rt));

				        });

				    } else {

				        _read_context.cache().on_mispopulate();

				    }

				}

				inline

				void cache_flat_mutation_reader::maybe_add_to_cache(const static_row& sr) {

				    if (can_populate()) {

Compare commits

3962 Commits scylla-5.1 ... br-next

24 .github/CODEOWNERS vendored Unescape Escape View File

17 .github/workflows/docs-amplify-enhanced.yaml vendored Normal file Unescape Escape View File

13 .github/workflows/docs-pages.yaml vendored Unescape Escape View File

8 .github/workflows/docs-pr.yaml vendored Unescape Escape View File

1 .gitignore vendored Unescape Escape View File

9 .gitmodules vendored Unescape Escape View File

877 CMakeLists.txt Unescape Escape View File

2 CONTRIBUTING.md Unescape Escape View File

2 HACKING.md Unescape Escape View File

12 README.md Unescape Escape View File

39 SCYLLA-VERSION-GEN Unescape Escape View File

1 abseil

30 alternator/CMakeLists.txt Normal file Unescape Escape View File

100 alternator/auth.cc Unescape Escape View File

6 alternator/auth.hh Unescape Escape View File

41 alternator/conditions.cc Unescape Escape View File

2 alternator/conditions.hh Unescape Escape View File

8 alternator/controller.cc Unescape Escape View File

12 alternator/controller.hh Unescape Escape View File

4 alternator/error.hh Unescape Escape View File

138 alternator/executor.cc Unescape Escape View File

11 alternator/executor.hh Unescape Escape View File

3 alternator/expressions.cc Unescape Escape View File

2 alternator/expressions_types.hh Unescape Escape View File

32 alternator/serialization.cc Unescape Escape View File

9 alternator/serialization.hh Unescape Escape View File

31 alternator/server.cc Unescape Escape View File

15 alternator/server.hh Unescape Escape View File

77 alternator/streams.cc Unescape Escape View File

115 alternator/ttl.cc Unescape Escape View File

7 alternator/ttl.hh Unescape Escape View File

15 amplify.yml Normal file Unescape Escape View File

70 api/CMakeLists.txt Normal file Unescape Escape View File

4 api/api-doc/storage_service.json Unescape Escape View File

39 api/api-doc/system.json Unescape Escape View File

329 api/api-doc/task_manager.json Normal file Unescape Escape View File

153 api/api-doc/task_manager_test.json Normal file Unescape Escape View File

51 api/api.cc Unescape Escape View File

22 api/api.hh Unescape Escape View File

19 api/api_init.hh Unescape Escape View File

3 api/authorization_cache.cc Unescape Escape View File

4 api/authorization_cache.hh Unescape Escape View File

85 api/cache_service.cc Unescape Escape View File

2 api/cache_service.hh Unescape Escape View File

2 api/collectd.cc Unescape Escape View File

2 api/collectd.hh Unescape Escape View File

368 api/column_family.cc Unescape Escape View File

11 api/column_family.hh Unescape Escape View File

1 api/commitlog.cc Unescape Escape View File

2 api/commitlog.hh Unescape Escape View File

46 api/compaction_manager.cc Unescape Escape View File

2 api/compaction_manager.hh Unescape Escape View File

1 api/config.cc Unescape Escape View File

2 api/config.hh Unescape Escape View File

41 api/endpoint_snitch.cc Unescape Escape View File

7 api/endpoint_snitch.hh Unescape Escape View File

1 api/error_injection.cc Unescape Escape View File

2 api/error_injection.hh Unescape Escape View File

27 api/failure_detector.cc Unescape Escape View File

2 api/failure_detector.hh Unescape Escape View File

25 api/gossiper.cc Unescape Escape View File

2 api/gossiper.hh Unescape Escape View File

17 api/hinted_handoff.cc Unescape Escape View File

4 api/hinted_handoff.hh Unescape Escape View File

1 api/lsa.cc Unescape Escape View File

2 api/lsa.hh Unescape Escape View File

3 api/messaging_service.cc Unescape Escape View File

4 api/messaging_service.hh Unescape Escape View File

223 api/storage_proxy.cc Unescape Escape View File

3 api/storage_proxy.hh Unescape Escape View File

500 api/storage_service.cc View File

56 api/storage_service.hh Unescape Escape View File

7 api/stream_manager.cc Unescape Escape View File

4 api/stream_manager.hh Unescape Escape View File

11 api/system.cc Unescape Escape View File

2 api/system.hh Unescape Escape View File

236 api/task_manager.cc Normal file Unescape Escape View File

18 api/task_manager.hh Normal file Unescape Escape View File

3962 Commits

scylla-5.1 ... br-next

24

.github/CODEOWNERS vendored

View File

17

.github/workflows/docs-amplify-enhanced.yaml vendored Normal file

View File

13

.github/workflows/docs-pages.yaml vendored

View File

8

.github/workflows/docs-pr.yaml vendored

View File

1

.gitignore vendored

View File

9

.gitmodules vendored

View File

877

CMakeLists.txt

View File

2

CONTRIBUTING.md

View File

2

HACKING.md

View File

12

README.md

View File

39

SCYLLA-VERSION-GEN

View File

1

abseil

30

alternator/CMakeLists.txt Normal file

View File

100

alternator/auth.cc

View File

6

alternator/auth.hh

View File

41

alternator/conditions.cc

View File

2

alternator/conditions.hh

View File

8

alternator/controller.cc

View File

12

alternator/controller.hh

View File

4

alternator/error.hh

View File

138

alternator/executor.cc

View File

11

alternator/executor.hh

View File

3

alternator/expressions.cc

View File

2

alternator/expressions_types.hh

View File

32

alternator/serialization.cc

View File

9

alternator/serialization.hh

View File

31

alternator/server.cc

View File

15

alternator/server.hh

View File

77

alternator/streams.cc

View File

115

alternator/ttl.cc

View File

7

alternator/ttl.hh

View File

15

amplify.yml Normal file

View File

70

api/CMakeLists.txt Normal file

View File

4

api/api-doc/storage_service.json

View File

39

api/api-doc/system.json

View File

329

api/api-doc/task_manager.json Normal file

View File

153

api/api-doc/task_manager_test.json Normal file

View File

51

api/api.cc

View File

22

api/api.hh

View File

19

api/api_init.hh

View File

3

api/authorization_cache.cc

View File

4

api/authorization_cache.hh

View File

85

api/cache_service.cc

View File

2

api/cache_service.hh

View File

2

api/collectd.cc

View File

2

api/collectd.hh

View File

368

api/column_family.cc

View File

11

api/column_family.hh

View File

1

api/commitlog.cc

View File

2

api/commitlog.hh

View File

46

api/compaction_manager.cc

View File

2

api/compaction_manager.hh

View File

1

api/config.cc

View File

2

api/config.hh

View File

41

api/endpoint_snitch.cc

View File

7

api/endpoint_snitch.hh

View File

1

api/error_injection.cc

View File

2

api/error_injection.hh

View File

27

api/failure_detector.cc

View File

2

api/failure_detector.hh

View File

25

api/gossiper.cc

View File

2

api/gossiper.hh

View File

17

api/hinted_handoff.cc

View File

4

api/hinted_handoff.hh

View File

1

api/lsa.cc

View File

2

api/lsa.hh

View File

3

api/messaging_service.cc

View File

4

api/messaging_service.hh

View File

223

api/storage_proxy.cc

View File

3

api/storage_proxy.hh

View File

500

api/storage_service.cc

View File

56

api/storage_service.hh

View File

7

api/stream_manager.cc

View File

4

api/stream_manager.hh

View File

11

api/system.cc

View File

2

api/system.hh

View File

236

api/task_manager.cc Normal file

View File

18

api/task_manager.hh Normal file

View File

102

api/task_manager_test.cc Normal file

View File