scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-04 22:13:19 +00:00

Author	SHA1	Message	Date
Botond Dénes	2d8d8043be	Merge 'Coroutinize system_keyspace::get_compaction_history' from Pavel Emelyanov Closes #13620 * github.com:scylladb/scylladb: system_keyspace: Fix indentation after previous patch system_keyspace: Coroutinize get_compaction_history()	2023-04-24 09:48:01 +03:00
Benny Halevy	2d20ee7d61	gms: version_generator: define version_type and generation_type strong types Derived from utils::tagged_integer, using different tags, the types are incompatible with each other and require explicit typecasting to- and from- their value type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:47:17 +03:00
Benny Halevy	d1817e9e1b	utils: move generation-number to gms Although get_generation_number implementation is completely generic, it is used exclusively to seed the gossip generation number. Following patches will define a strong gms::generation_id type and this function should return it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Tomasz Grabiec	bd0b299322	Merge 'Manage CDC generations when bootstrapping nodes using Raft Group 0 topology coordinator' from Kamil Braun Introduce a new table `CDC_GENERATIONS_V3` (`system.cdc_generations_v3`). The table schema is a copy-paste of the `CDC_GENERATIONS_V2` schema. The difference is that V2 lives in `system_distributed_keyspace` and writes to it are distributed using regular `storage_proxy` replication mechanisms based on the token ring. The V3 table lives in `system_keyspace` and any mutations written to it will go through group 0. Extend the `TOPOLOGY` schema with new columns: - `new_cdc_generation_data_uuid` will be stored as part of a bootstrapping node's `ring_slice`, it stores UUID of a newly introduced CDC generation which is used as partition key for the `CDC_GENERATIONS_V3` table to access this new generation's data. It's a regular column, meaning that every row (corresponding to a node) will have its own. - `current_cdc_generation_uuid` and `current_cdc_generation_timestamp` together form the ID of the newest CDC generation in the cluster. (the uuid is the data key for `CDC_GENERATIONS_V3`, the timestamp is when the CDC generation starts operating). Those are static columns since there's a single newest CDC generation. When topology coordinator handles a request for node to join, calculate a new CDC generation using the bootstrapping node's tokens, translate it to mutation format, and insert this mutation to the CDC_GENERATIONS_V3 table through group 0 at the same time we assign tokens to the node in Raft topology. The partition key for this data is stored in the bootstrapping node's `ring_slice`. After inserting new CDC generation data , we need to pick a timestamp for this generation and commit it, telling all nodes in the cluster to start using the generation for CDC log writes once their clocks cross that timestamp. We introduce a separate step to the bootstrap saga, before `write_both_read_old`, called `commit_cdc_generation`. In this step, the coordinator takes the `new_cdc_generation_data_uuid` stored in a bootstrapping node's `ring_slice` - which serves as the key to the table where the CDC generation data is stored - and combines it with a timestamp which it generates a bit into the future (as in old gossiper-based code, we use 2 * ring_delay, by default 1 minute). This gives us a CDC generation ID which we commit into the topology state as the `current_cdc_generation_id` while switching the saga to the next step, `write_both_read_old`. Once a new CDC generation is committed to the cluster by the topology coordinator, we also need to publish it to the user-facing description tables so CDC applications know which streams to read from. This uses regular distributed table writes underneath (tables living in the `system_distributed` keyspace) so it requires `token_metadata` to be nonempty. We need a hack for the case of bootstrapping the first node in the cluster - turning the tokens into normal tokens earlier in the procedure in `token_metadata`, but this is fine for the single-node case since no streaming is happening. When a node notices that a new CDC generation was introduced in `storage_service::topology_state_load`, it updates its internal data structures that are used when coordinating writes to CDC log tables. We include the current CDC generation data in topology snapshot transfers. Some fixes and refactors included. Closes #13385 * github.com:scylladb/scylladb: docs: cdc: describe generation changes using group 0 topology coordinator cdc: generation_service: add a FIXME cdc: generation_service: add legacy_ prefix for gossiper-based functions storage_service: include current CDC generation data in topology snapshots db: system_keyspace: introduce `query_mutations` with range/slice storage_service: hold group 0 apply mutex when reading topology snapshot service: raft_group0_client: introduce `hold_read_apply_mutex` storage_service: use CDC generations introduced by Raft topology raft topology: publish new CDC generation to the user description tables raft topology: commit a new CDC generation on node bootstrap raft topology: create new CDC generation data during node bootstrap service: topology_state_machine: make topology::find const db: system_keyspace: small refactor of `load_topology_state` cdc: generation: extract pure parts of `make_new_generation` outside db: system_keyspace: add storage for CDC generations managed by group 0 service: topology_state_machine: better error checking for state name (de)serialization service: raft: plumbing `cdc::generation_service&` cdc: generation: `get_cdc_generation_mutations`: take timestamp as parameter cdc: generation: make `topology_description_generator::get_sharding_info` a parameter sys_dist_ks: make `get_cdc_generation_mutations` public sys_dist_ks: move find_schema outside `get_cdc_generation_mutations` sys_dist_ks: move mutation size threshold calculation outside `get_cdc_generation_mutations` service/raft: group0_state_machine: signal topology state machine in `load_snapshot`	2023-04-21 18:11:27 +02:00
Pavel Emelyanov	2aabaada9e	system_keyspace: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-21 17:32:57 +03:00
Pavel Emelyanov	6290849f11	system_keyspace: Coroutinize get_compaction_history() In order not to copy the rvalue consumer arg -- instantly convert it into value. No other tricks. Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-21 17:32:02 +03:00
Kamil Braun	f9d8118c8d	db: system_keyspace: use microsecond resolution for group0_history range tombstone in `make_group0_history_state_id_mutation`, when adding a new entry to the group 0 history table, if the parameter `gc_older_than` is engaged, we create a range tombstone in the mutation which deletes entries older than the new one by `gc_older_than`. In particular if `gc_older_than = 0`, we want to delete all older entries. There was a subtle bug there: we were using millisecond resolution when generating the tombstone, while the provided state IDs used microsecond resolution. On a super fast machine it could happen that we managed to perform two schema changes in a single millisecond; this happened sometimes in `group0_test.test_group0_history_clearing_old_entries` on our new CI/promotion machines, causing the test to fail because the tombstone didn't clear the entry correspodning to the previous schema change when performing the next schema change (since they happened in the same millisecond). Use microsecond resolution to fix that. The consecutive state IDs used in group 0 mutations are guaranteed to be strictly monotonic at microsecond resolution (see `generate_group0_state_id` in service/raft/raft_group0_client.cc). Fixes #13594	2023-04-21 10:33:05 +02:00
Kamil Braun	3d96bc5dba	db: system_keyspace: introduce `query_mutations` with range/slice There is a `query_mutations` function which loads the entire contents of a given table into memory. There was no function for e.g. loading just a single partition in the form of mutations. Introduce one.	2023-04-20 16:36:41 +02:00
Kamil Braun	5f2b297f99	raft topology: publish new CDC generation to the user description tables Once a new CDC generation is committed to the cluster by the topology coordinator, we also need to publish it to the user-facing description tables so CDC applications know which streams to read from. This uses regular distributed table writes underneath (tables living in the `system_distributed` keyspace) so it requires `token_metadata` to be nonempty. We need a hack for the case of bootstrapping the first node in the cluster - turning the tokens into normal tokens earlier in the procedure in `token_metadata`, but this is fine for the single-node case since no streaming is happening.	2023-04-20 16:36:41 +02:00
Kamil Braun	58baf998c1	raft topology: commit a new CDC generation on node bootstrap After inserting new CDC generation data (see previous commit), we need to pick a timestamp for this generation and commit it, telling all nodes in the cluster to start using the generation for CDC log writes once their clocks cross that timestamp. We introduce a separate step to the bootstrap saga, before `write_both_read_old`, called `commit_cdc_generation`. In this step, the coordinator takes the `new_cdc_generation_data_uuid` stored in a bootstrapping node's `ring_slice` - which serves as the key to the table where the CDC generation data is stored - and combines it with a timestamp which it generates a bit into the future (as in old gossiper-based code, we use 2 * ring_delay, by default 1 minute). This gives us a CDC generation ID which we commit into the topology state as the `current_cdc_generation_id` while switching the saga to the next step, `write_both_read_old`. `system_keyspace::load_topology_state` is extended to load `current_cdc_generation_id`. For now, nodes don't react to `current_cdc_generation_id`. In later commit we'll extend `storage_service::topology_state_load` to start using the current CDC generation for CDC log table writes. The solution with specifying a timestamp into the future is the same as it is for gossip-based topology changes and it has the same consistency problem - if some node is temporarily partitioned away from the quorum, it might not learn about the new CDC generation before its clock crosses the generation's timestamp, causing it to temporarily send writes to the wrong CDC streams (until it learns about the new timestamp). I left a FIXME which describes an alternative solution which wasn't viable for gossiper-based topology changes, but it is viable when we have a fault-tolerant topology coordinator.	2023-04-20 16:36:41 +02:00
Kamil Braun	5942237a79	raft topology: create new CDC generation data during node bootstrap Calculate a new CDC generation using the bootstrapping node's tokens, translate it to mutation format, and insert this mutation to the CDC_GENERATIONS_V3 table through group 0 at the same time we assign tokens to the node in Raft topology. The partition key for this data is stored in the bootstrapping node's `ring_slice`. The data is inserted, but it's not used for anything yet, we'll do it in later commits. Two FIXMEs are left for follow-ups: - in `get_sharding_info` we shouldn't have to use the token owner's IP, but get the host ID directly from token metadata (#12279), - splitting the CDC generation data write into multiple commands. The comment elaborates.	2023-04-20 16:35:37 +02:00
Kamil Braun	22094f1509	db: system_keyspace: small refactor of `load_topology_state` The variables necessary for constructing a `ring_slice` are now living in a local block of code. This makes it easier to see which data is part of the `ring_slice` and will make it easier to add more data to `ring_slice` in following commits. Also add some more sanity checking.	2023-04-20 15:40:23 +02:00
Kamil Braun	2233d8f54d	db: system_keyspace: add storage for CDC generations managed by group 0 The `CDC_GENERATIONS_V3` table schema is a copy-paste of the `CDC_GENERATIONS_V2` schema. The difference is that V2 lives in `system_distributed_keyspace` and writes to it are distributed using regular `storage_proxy` replication mechanisms based on the token ring. The V3 table lives in `system_keyspace` and any mutations written to it will go through group 0. Also extend the `TOPOLOGY` schema with new columns: - `new_cdc_generation_data_uuid` will be stored as part of a bootstrapping node's `ring_slice`, it stores UUID of a newly introduced CDC generation which is used as partition key for the `CDC_GENERATIONS_V3` table to access this new generation's data. It's a regular column, meaning that every row (corresponding to a node) will have its own. - `current_cdc_generation_uuid` and `current_cdc_generation_timestamp` together form the ID of the newest CDC generation in the cluster. (the uuid is the data key for `CDC_GENERATIONS_V3`, the timestamp is when the CDC generation starts operating). Those are static columns since there's a single newest CDC generation.	2023-04-20 15:38:58 +02:00
Pavel Emelyanov	9628d07adb	Put storage_service.hh on a diet By removing unneeded headers inclusions. At the cost of few more forward declarations and a couple of extra includes in other .cc files. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13552	2023-04-18 14:53:17 +03:00
Kamil Braun	f9051dccaa	raft topology: store `shard_count` and `ignore_msb` in topology Add new columns to the `system.topology` table: `shard_count` and `ignore_msb`. When a node bootstraps or restarts and observes that the values stored in `topology` are different than the local values, it updates them. This is done in the `update_topology_with_local_metadata` function (the 'metadata' here being the two values). Additional flag persisted in `system.scylla_local` is used to safely avoid performing read barriers when the values didn't change on node restart. A comment in `update_topology_with_local_metadata` explains why this flag is needed. An example use case where `shard_count` and `ignore_msb` are needed is creating CDC generations. Fixes: #13508	2023-04-17 10:45:30 +02:00
Pavel Emelyanov	08e9046d07	system_keyspace: Add ownership table The schema is CREATE TABLE system.sstables ( location text, generation bigint, format text, status text, uuid uuid, version text, PRIMARY KEY (location, generation) ) A sample entry looks like: location \| generation \| format \| status \| uuid \| version ---------------------------------------------------------------------+------------+--------+--------+--------------------------------------+--------- /data/object_storage_ks/test_table-d096a1e0ad3811ed85b539b6b0998182 \| 2 \| big \| sealed \| d0a743b0-ad38-11ed-85b5-39b6b0998182 \| me The uuid field points to the "folder" on the storage where the sstable components are. Like this: s3 `- test_bucket `- f7548f00-a64d-11ed-865a-0c1fbc116bb3 `- Data.db - Index.db - Filter.db - ... It's not very nice that the whole /var/lib/... path is in fact used as location, it needs the PR #12707 to fix this place. Also, the "status" part is not yet fully functional, it only supports three options: - creating -- the same as TemporaryTOC file exists on disk - sealed -- default state - deleting -- the analogy for the deletion log on disk The latter needs support from the distributed_loader, which's not yet there. In fact, distributes_loader also needs to be patched to actualy select entries from this table on load. Also it needs the mentioned PR #12707 to support staging and quarantine sstables. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:44:28 +03:00
Pavel Emelyanov	18333b4225	system_keyspace.hh: Remove unneeded headers Now this header can replace lots of used types with plain forward declarations Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-06 12:37:00 +03:00
Pavel Emelyanov	1af373cf0a	system_keyspace: Move topology_mutation_builder to storage_service The latter is the only user of the class. This keeps system keyspace code free from unrelated logic and from raft::server_id type. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-06 12:36:02 +03:00
Pavel Emelyanov	45de375126	system_keyspace: Move group0_upgrade_state conversions to group0 code In order to keep system keyspace free from group0 logic and from the service::group0_upgrade_state type Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-06 12:35:07 +03:00
Kamil Braun	cd282cf0ab	Merge 'Raft, use schema commit log' from Gusev Petr We need this so that we can have multi-partition mutations which are applied atomically. If they live on different shards, we can't guarantee atomic write to the commitlog. Fixes: #12642 Closes #13134 * github.com:scylladb/scylladb: test_raft_upgrade: add a test for schema commit log feature scylla_cluster.py: add start flag to server_add ServerInfo: drop host_id scylla_cluster.py: add config to server_add scylla_cluster.py: add expected_error to server_start scylla_cluster.py: ScyllaServer.start, refactor error reporting scylla_cluster.py: fix ScyllaServer.start, reset cmd if start failed raft: check if schema commitlog is initialized Refuse to boot if neither the schema commitlog feature nor force_schema_commit_log is set. For the upgrade procedure the user should wait until the schema commitlog feature is enabled before enabling consistent_cluster_management. raft: move raft initialization after init_system_keyspace database: rename before_schema_keyspace_init->maybe_init_schema_commitlog raft: use schema commitlog for raft tables init_system_keyspace: refactoring towards explicit load phases	2023-03-27 13:27:30 +02:00
Tomasz Grabiec	c54a3d9c10	Merge 'Clean enabled features manipulations in system keyspace' from Pavel Emelyanov There was an attempt to cut feature-service -> system-keyspace dependency (#13172) which turned out to require more changes. Here's a preparation squeezing from this future work. This set - leaves only batch-enabling API in feature service - keeps the need for async context in feature service - narrows down system keyspace features API to only load and store records - relaxes features updating logic in sys.ks. - cosmetic Closes #13264 * github.com:scylladb/scylladb: feature_service: Indentation fix after previous patch feature_service: Move async context into enable() system_keyspace: Refactor local features load/save helpers feature_service: Mark supported_feature_set() const feature_service: Remove single feature enabling method boot: Enable features in batch gossiper: Enable features in batch	2023-03-24 13:12:49 +01:00
Petr Gusev	769732d095	database: rename before_schema_keyspace_init->maybe_init_schema_commitlog We are going to move the raft tables from the first load phase to the second. This means the second init_system_keyspace call will load raft tables along with the schema, making the name of this function imprecise.	2023-03-24 15:54:52 +04:00
Petr Gusev	273e70e1f9	raft: use schema commitlog for raft tables Fixes: #12642	2023-03-24 15:54:52 +04:00
Petr Gusev	5a5d664a5a	init_system_keyspace: refactoring towards explicit load phases We aim (#12642) to use the schema commit log for raft tables. Now they are loaded at the first call to init_system_keyspace in main.cc, but the schema commitlog is only initialized shortly before the second call. This is important, since the schema commitlog initialization (database::before_schema_keyspace_init) needs to access schema commitlog feature, which is loaded from system.scylla_local and therefore is only available after the first init_system_keyspace call. So the idea is to defer the loading of the raft tables until the second call to init_system_keyspace, just as it works for schema tables. For this we need a tool to mark which tables should be loaded in the first or second phase. To do this, in this patch we introduce system_table_load_phase enum. It's set in the schema_static_props for schema tables. It replaces the system_keyspace::table_selector in the signature of init_system_keyspace. The call site for populate_keyspace in init_system_keyspace was changed, table_selector.contains_keyspace was replaced with db.local().has_keyspace. This check prevents calling populate_keyspace(system_schema) on phase1, but allows for populate_keyspace(system) on phase2 (to init raft tables). On this second call some tables from system keyspace (e.g. system.local) may have already been populated on phase1. This check protects from double-populating them, since every populated cf is marked as ready_for_writes.	2023-03-24 15:54:46 +04:00
Gleb Natapov	5e232ebee5	system_keyspace: add a table to persist topology change state machine's state Add local table to store topology change state machine's state there. Also add a function that loads the state to memory.	2023-03-21 16:06:43 +02:00
Pavel Emelyanov	8600cb2db0	feature_service: Move async context into enable() Callers don't need to know that enabling features has this requirement Indentation is deliberately left broken (until next patch) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-21 11:59:34 +03:00
Pavel Emelyanov	ae6e29a919	system_keyspace: Refactor local features load/save helpers Introduce load_local_enabled_features() and save_local_enabled_features() that get and put std::set<sstring> with feature names (and perform set to string and back conversions on their own). They look natural next to existing sys.ks. methods to get/set local-supported features and peer features. Using the new API, the more generic functions to preserve individual features and load them on startup can become much shorter and cleaner. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-21 11:54:02 +03:00
Pavel Emelyanov	b27d2c9399	boot: Enable features in batch On boot main calls enable_features_on_startup() which at the end scans through the list of features and enables them. Same as in previous patch -- it makes sense to use batch enabling here. Note, that despite the loop that collects features is not as trivial as in previous patch (gossiper case), it still operates with local copies of feature sets so delaying the feature's enabling doesn't affect other features' need to be enabled too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-21 11:12:25 +03:00
Kefu Chai	c37f4e5252	treewide: use fmt::join() when appropriate now that fmtlib provides fmt::join(). see https://fmt.dev/latest/api.html#_CPPv4I0EN3fmt4joinE9join_viewIN6detail10iterator_tI5RangeEEN6detail10sentinel_tI5RangeEEERR5Range11string_view there is not need to revent the wheel. so in this change, the homebrew join() is replaced with fmt::join(). as fmt::join() returns an join_view(), this could improve the performance under certain circumstances where the fully materialized string is not needed. please note, the goal of this change is to use fmt::join(), and this change does not intend to improve the performance of existing implementation based on "operator<<" unless the new implementation is much more complicated. we will address the unnecessarily materialized strings in a follow-up commit. some noteworthy things related to this change: * unlike the existing `join()`, `fmt::join()` returns a view. so we have to materialize the view if what we expect is a `sstring` * `fmt::format()` does not accept a view, so we cannot pass the return value of `fmt::join()` to `fmt::format()` * fmtlib does not format a typed pointer, i.e., it does not format, for instance, a `const std::string`. but operator<<() always print a typed pointer. so if we want to format a typed pointer, we either need to cast the pointer to `void` or use `fmt::ptr()`. * fmtlib is not able to pick up the overload of `operator<<(std::ostream& os, const column_definition* cd)`, so we have to use a wrapper class of `maybe_column_definition` for printing a pointer to `column_definition`. since the overload is only used by the two overloads of `statement_restrictions::add_single_column_parition_key_restriction()`, the operator<< for `const column_definition*` is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-16 20:34:18 +08:00
Petr Gusev	3ef201d67a	schema.hh: use schema_static_props for wait_for_sync_to_commitlog This patch continues the refactoring, now we move wait_for_sync_to_commitlog property from schema_builder to schema_static_props. The patch replaces schema_builder::set_wait_for_sync_to_commitlog and is_extra_durable with two register_static_configurator, one in system_keyspace and another in system_distributed_keyspace. They correspond to the two parts of the original disjunction in schema_tables::is_extra_durable.	2023-03-14 19:26:05 +04:00
Petr Gusev	349bc1a9b6	schema.hh: introduce schema_static_props, use it for null_sharder Our goal (#12642) is to mark raft tables to use schema commitlog. There are two similar cases in code right now - with_null_sharder and set_wait_for_sync_to_commitlog schema_builder methods. The problem is that if we need to mark some new schema with one of these methods we need to do this twice - first in a method describing the schema (e.g. system_keyspace::raft()) and second in the function create_table_from_mutations, which is not obvious and easy to forget. create_table_from_mutations is called when schema object is reconstructed from mutations, with_null_sharder and set_wait_for_sync_to_commitlog must be called from it since the schema properties they describe are not included in the mutation representation of the schema. This patch proposes to distinguish between the schema properties that get into mutations and those that do not. The former are described with schema_builder, while for the latter we introduce schema_static_props struct and the schema_builder::register_static_configurator method. This way we can formulate a rule once in the code about which schemas should have a null sharder, and it will be enforced in all cases.	2023-03-14 18:29:34 +04:00
Botond Dénes	d1619eb38a	Merge 'Remove qctx from helpers that retrieve truncation record' from Pavel Emelyanov There are two places that do it -- commitlog and batchlog replayers. Both can have local system-keyspace reference and use system-keyspace local query-processor for it. The peering save_truncation_record() is not that simple and is not patched by this PR Closes #13087 * github.com:scylladb/scylladb: system_keyspace: Unstatic get_truncation_record() system_keyspace: Unstatic get_truncated_at() batchlog_manager: Add system_keyspace dependency main: Swap batchlog manager and system keyspace starts system_keyspace: Unstatic get_truncated_position() system_keyspace: Remove unused method commitlog: Create commitlog_replayer with system keyspace test: Make cql_test_env::get_system_keyspace() return sharded commiltlog: Line-up field definitions	2023-03-07 10:19:55 +02:00
Kefu Chai	020483aa59	Update seastar submodule and main this change also includes change to main, to make this commit compile. see below: * seastar 9b6e181e42...9cbc1fe889 (46): > Merge 'Make io-tester jobs share sched classes' from Pavel Emelyanov > io_tester.md: Update the `rps` configuration option description > io_tester: Add option to limit total number of requests sent > Merge 'Keep outgoing queue all cancellable while negotiating (again)' from Pavel Emelyanov > io_tester: Add option to share classes between jobs > rpc: Abort connection if send_entry() fails > Merge 'build: build dpdk with `-fPIC` if BUILD_SHARED_LIBS' from Kefu Chai > build: cooking.sh: use the same BUILD_SHARED_LIBS when building ingredients > build: cooking.sh: use the same generator when building ingredients > core/memory: handle `strerror_r` returning static string > Merge 'build, rpc: lz4 related cleanups' from Kefu Chai > build, rpc: do not support lz4 < 1.7.3 > build: set the correct version when finding lz4 > build: include CheckSymbolExists > rpc: do not include lz4.h in header > build: set CMP0135 for Cooking.cmake > docs: drop building-.md > Merge 'seastar-addr2line: cleanups' from Kefu Chai > seastar-addr2line: refactor tests using unittest > seastar-addr2line: extract do_test() and main() > seastar-addr2line: do not import unused modules > scheduling: add a `rename` callback to scheduling_group_key_config > reactor: syscall thread: wakeup up reactor with finer granularity > build: build dpdk with `-fPIC` if BUILD_SHARED_LIBS > build: extract dpdk_extra_cflags out > core/sstring: remove a temporary variable > Merge 'treewide: include what we use, and add a checkheaders target' from Kefu Chai > perftune.py: auto-select the same number of IRQ cores on each NUMA > prometheus: remove unused headers > core/sstring: define <=> operator for sstring > Merge 'core: s/reserve_additional_memory/reserve_additional_memory_per_shard/' from Kefu Chai > include: do not include <concepts> directly > coding_style: note on self-contained header requirement > circileci: build checkheaders in addition to default target > build: add checkheaders target > net/toeplitz: s/u_int/unsigned/ > net/tcp-stack: add forward declaration for seastar::socket > core, net, util: include used headers main: set reserved memory for wasm on per-shard basis this change is a follow-up of `f05d612da8` and `4a0134a097`. this change depends on the related change in Seastar to reserve additional memory on a per-shard basis. per Wojciech Mitros's comment: > it should have probably been 50MB per shard in other words, as we always execute the same set of udf on all shards. and since one cannot predict the number of shards, but she could have a rough estimation on the size of memory a regular (set of) udf could use. so a per-shard setting makes more sense. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-06 18:41:34 +02:00
Pavel Emelyanov	1be9b0df50	system_keyspace: Unstatic get_truncation_record() Now when both callers of this method are non-static, it can be made non-static too. While at it make two more changes: 1. move the thing to private 2. remove explicit cql3::query_processor::cache_internal::yes argument, the system_keyspace::execute_cql() applies it on itw own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-06 13:28:40 +03:00
Pavel Emelyanov	2501ba3887	system_keyspace: Remove unused method The get_truncated_position() overload that filters records by shard is nowadays unused. Drop one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-03-06 13:28:40 +03:00
Kefu Chai	df63e2ba27	types: move types.{cc,hh} into types they are part of the CQL type system, and are "closer" to types. let's move them into "types" directory. the building systems are updated accordingly. the source files referencing `types.hh` were updated using following command: ``` find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} + ``` the source files under sstables include "types.hh", which is indeed the one located under "sstables", so include "sstables/types.hh" instea, so it's more explicit. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12926	2023-02-19 21:05:45 +02:00
Avi Kivity	e2f6e0b848	utils: move hashing related files to utils/ module Closes #12884	2023-02-17 07:19:52 +02:00
Botond Dénes	dc3d47b1e4	Merge 'Get compaction history without using qctx' from Pavel Emelyanov There are two methods to mess with compaction history -- update and get. The former had been patched to use local system-keyspace instance by `907fd2d3` (system_keyspace: De-static compaction history update) now it's time for the latter (spoiler: it's only used by the API handler) Closes #12889 * github.com:scylladb/scylladb: system_keyspace; Make get_compaction_history non static and drop qctx api, compaction_manager: Get compaction history via manager system_keyspace: Move compaction_history_entry to namespace scope	2023-02-16 19:05:48 +02:00
Pavel Emelyanov	e234726123	system_keyspace; Make get_compaction_history non static and drop qctx Now the call is done via the system_keyspace instance, so it can be unmarked static and can use the local query processor instead of global qctx. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-16 11:28:04 +03:00
Pavel Emelyanov	737f4acc10	features: Enable persisted features on all shards Commit `1365e2f13e` (gms: feature_service: re-enable features on node startup) re-enabled features on feature service very early, so that on boot a node sees its "correct" features state before it starts loading system tables and replaying commitlog. However, checking features happens on all shards independently, so re-enabling should also happen on all shards. One faced problem is in extract_scylla_specific_keyspace_info(). This helper is used when loading non-system keyspace to read scylla-specific keyspace options. The helper is called on all shards and on all-but-zero it evaluates the checked SCYLLA_KEYSPACES feature to false leaving the specific data empty. As the result, different shards have different view of keyspaces' configuration. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12881	2023-02-16 00:52:05 +01:00
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Avi Kivity	69a385fd9d	Introduce schema/ module Schema related files are moved there. This excludes schema files that also interact with mutations, because the mutation module depends on the schema. Those files will have to go into a separate module. Closes #12858	2023-02-15 11:01:50 +02:00
Pavel Emelyanov	d021aaf34d	system_keysace: De-static calls that update view-building tables There's a bunch of them used by mainly view_builder and also by the API and storage_service. All use global qctx to make its job, now when the callers have main-local sharded<system_keysace> references they can be made non-static. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-03 21:56:54 +03:00
Kefu Chai	4a0134a097	db: system_keyspace: take the reserved_memory into account before this change, we returns the total memory managed by Seastar in the "total" field in system.memory. but this value only reflect the total memory managed by Seastar's allocator. if `reserve_additional_memory` is set when starting app_template, Seastar's memory subsystem just reserves a chunk of memory of this specified size for system, and takes the remaining memory. since `f05d612da8`, we set this value to 50MB for wasmtime runtime. hence the test of `TestRuntimeInfoTable.test_default_content` in dtest fails. the test expects the size passed via the option of `--memory` to be identical to the value reported by system.memory's "total" field. after this change, the "total" field takes the reserved memory for wasm udf into account. the "total" field should reflect the total size of memory used by Scylla, no matter how we use a certain portion of the allocated memory. Fixes #12522 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12573	2023-01-24 14:07:44 +02:00
Pavel Emelyanov	be2ad2fe99	system_keyspace: De-static system_keyspace::increment_and_get_generation It's only called on cluster-join from storage_service which has the local system_keyspace reference and it's already started by that time. This allows removing few more occurrences of global qctx. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-01-20 17:24:22 +03:00
Pavel Emelyanov	4c4f8aa3e1	system_keyspace: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-01-20 17:24:22 +03:00
Pavel Emelyanov	b0edc07339	system_keyspace: Coroutinize system_keyspace::increment_and_get_generation Just unroll the fn().then({ fn2().then().then(); }); chain. Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-01-20 17:24:10 +03:00
Kamil Braun	a483915c62	db: system_keyspace: add a virtual table with raft configuration Add a new virtual table `system.raft_state` that shows the currently operating Raft configuration for each present group. The schema is the same as `system.raft_snapshot_config` (the latter shows the config from the last snapshot). In the future we plan to add more columns to this table, showing more information (like the current leader and term), hence the generic name. Adding the table requires some plumbing of `sharded<raft_group_registry>&` through function parameters to make it accessible from `register_virtual_tables`, but it's mostly straightforward. Also added some APIs to `raft_group_registry` to list all groups and find a given group (returning `nullptr` if one isn't found, not throwing an exception).	2023-01-17 12:28:00 +01:00
Kamil Braun	2bfe85ce9b	db: system_keyspace: improve system.raft_snapshot_config schema Remove the `ip_addr` column which was not used. IP addresses are not part of Raft configuration now and they can change dynamically. Swap the `server_id` and `disposition` columns in the clustering key, so when querying the configuration, we first obtain all servers with the current disposition and then all servers with the previous disposition (note that a server may appear both in current and previous).	2023-01-17 12:28:00 +01:00
Kamil Braun	be390285b6	db: system_keyspace: remove (my_)server_id column from RAFT_SNAPSHOTS and RAFT_SNAPSHOT_CONFIG A single node will run a single Raft server in any given Raft group, so this column is not necessary.	2023-01-12 16:48:50 +01:00

1 2 3 4 5 ...

521 Commits