scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 04:56:58 +00:00

Author	SHA1	Message	Date
Botond Dénes	2d8d8043be	Merge 'Coroutinize system_keyspace::get_compaction_history' from Pavel Emelyanov Closes #13620 * github.com:scylladb/scylladb: system_keyspace: Fix indentation after previous patch system_keyspace: Coroutinize get_compaction_history()	2023-04-24 09:48:01 +03:00
Botond Dénes	85abece927	Merge 'Restrict logging of current_backtrace to log_level' from Benny Halevy `seastar::current_backtrace()` can be quite heavey. When we pass it to a log message in relatively detailed log_level (debug/trace), we pay the price of `current_backtrace` every time, but we rarely print the message. Closes #13527 * github.com:scylladb/scylladb: locator/topology: call seastar::current_backtrace only when log_level is enabled schema_tables: call seastar::current_backtrace only when log_level is enabled	2023-04-24 08:50:32 +03:00
Botond Dénes	7f04d8231d	Merge 'gms: define and use generation and version types' from Benny Halevy This series cleans up the generation and value types used in gms / gossiper. Currently we use a blend of int, int32_t, and int64_t around messaging. This change defines gms::generation_type and gms::version_type as int32_t and add check in non-release modes that the respective int64 value passed over messaging do not overflow 32 bits. Closes #12966 * github.com:scylladb/scylladb: gossiper: version_generator: add {debug_,}validate_gossip_generation gms: gossip_digest: use generation_type and version_type gms: heart_beat_state: use generation_type and version_type gms: versioned_value: use version_type gms: version_generator: define version_type and generation_type strong types utils: move generation-number to gms utils: add tagged_integer gms: versioned_value: make members private scylla-gdb: add get_gms_versioned_value gms: versioned_value: delete unused compare_to function gms: gossip_digest: delete unused compare_to function	2023-04-24 08:44:48 +03:00
Pavel Emelyanov	5e201b9120	database: Remove compaction_manager.hh inclusion into database.hh The only reason why it's there (right next to compaction_fwd.hh) is because the database::table_truncate_state subclass needs the definition of compaction_manager::compaction_reenabler subclass. However, the former sub is not used outside of database.cc and can be defined in .cc. Keeping it outside of the header allows dropping the compaction_manager.hh from database.hh thus greatly reducing its fanout over the code (from ~180 indirect inclusions down to ~20). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13622	2023-04-23 16:27:11 +03:00
Benny Halevy	2d20ee7d61	gms: version_generator: define version_type and generation_type strong types Derived from utils::tagged_integer, using different tags, the types are incompatible with each other and require explicit typecasting to- and from- their value type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:47:17 +03:00
Benny Halevy	d1817e9e1b	utils: move generation-number to gms Although get_generation_number implementation is completely generic, it is used exclusively to seed the gossip generation number. Following patches will define a strong gms::generation_id type and this function should return it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Tomasz Grabiec	bd0b299322	Merge 'Manage CDC generations when bootstrapping nodes using Raft Group 0 topology coordinator' from Kamil Braun Introduce a new table `CDC_GENERATIONS_V3` (`system.cdc_generations_v3`). The table schema is a copy-paste of the `CDC_GENERATIONS_V2` schema. The difference is that V2 lives in `system_distributed_keyspace` and writes to it are distributed using regular `storage_proxy` replication mechanisms based on the token ring. The V3 table lives in `system_keyspace` and any mutations written to it will go through group 0. Extend the `TOPOLOGY` schema with new columns: - `new_cdc_generation_data_uuid` will be stored as part of a bootstrapping node's `ring_slice`, it stores UUID of a newly introduced CDC generation which is used as partition key for the `CDC_GENERATIONS_V3` table to access this new generation's data. It's a regular column, meaning that every row (corresponding to a node) will have its own. - `current_cdc_generation_uuid` and `current_cdc_generation_timestamp` together form the ID of the newest CDC generation in the cluster. (the uuid is the data key for `CDC_GENERATIONS_V3`, the timestamp is when the CDC generation starts operating). Those are static columns since there's a single newest CDC generation. When topology coordinator handles a request for node to join, calculate a new CDC generation using the bootstrapping node's tokens, translate it to mutation format, and insert this mutation to the CDC_GENERATIONS_V3 table through group 0 at the same time we assign tokens to the node in Raft topology. The partition key for this data is stored in the bootstrapping node's `ring_slice`. After inserting new CDC generation data , we need to pick a timestamp for this generation and commit it, telling all nodes in the cluster to start using the generation for CDC log writes once their clocks cross that timestamp. We introduce a separate step to the bootstrap saga, before `write_both_read_old`, called `commit_cdc_generation`. In this step, the coordinator takes the `new_cdc_generation_data_uuid` stored in a bootstrapping node's `ring_slice` - which serves as the key to the table where the CDC generation data is stored - and combines it with a timestamp which it generates a bit into the future (as in old gossiper-based code, we use 2 * ring_delay, by default 1 minute). This gives us a CDC generation ID which we commit into the topology state as the `current_cdc_generation_id` while switching the saga to the next step, `write_both_read_old`. Once a new CDC generation is committed to the cluster by the topology coordinator, we also need to publish it to the user-facing description tables so CDC applications know which streams to read from. This uses regular distributed table writes underneath (tables living in the `system_distributed` keyspace) so it requires `token_metadata` to be nonempty. We need a hack for the case of bootstrapping the first node in the cluster - turning the tokens into normal tokens earlier in the procedure in `token_metadata`, but this is fine for the single-node case since no streaming is happening. When a node notices that a new CDC generation was introduced in `storage_service::topology_state_load`, it updates its internal data structures that are used when coordinating writes to CDC log tables. We include the current CDC generation data in topology snapshot transfers. Some fixes and refactors included. Closes #13385 * github.com:scylladb/scylladb: docs: cdc: describe generation changes using group 0 topology coordinator cdc: generation_service: add a FIXME cdc: generation_service: add legacy_ prefix for gossiper-based functions storage_service: include current CDC generation data in topology snapshots db: system_keyspace: introduce `query_mutations` with range/slice storage_service: hold group 0 apply mutex when reading topology snapshot service: raft_group0_client: introduce `hold_read_apply_mutex` storage_service: use CDC generations introduced by Raft topology raft topology: publish new CDC generation to the user description tables raft topology: commit a new CDC generation on node bootstrap raft topology: create new CDC generation data during node bootstrap service: topology_state_machine: make topology::find const db: system_keyspace: small refactor of `load_topology_state` cdc: generation: extract pure parts of `make_new_generation` outside db: system_keyspace: add storage for CDC generations managed by group 0 service: topology_state_machine: better error checking for state name (de)serialization service: raft: plumbing `cdc::generation_service&` cdc: generation: `get_cdc_generation_mutations`: take timestamp as parameter cdc: generation: make `topology_description_generator::get_sharding_info` a parameter sys_dist_ks: make `get_cdc_generation_mutations` public sys_dist_ks: move find_schema outside `get_cdc_generation_mutations` sys_dist_ks: move mutation size threshold calculation outside `get_cdc_generation_mutations` service/raft: group0_state_machine: signal topology state machine in `load_snapshot`	2023-04-21 18:11:27 +02:00
Pavel Emelyanov	2aabaada9e	system_keyspace: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-21 17:32:57 +03:00
Pavel Emelyanov	6290849f11	system_keyspace: Coroutinize get_compaction_history() In order not to copy the rvalue consumer arg -- instantly convert it into value. No other tricks. Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-21 17:32:02 +03:00
Botond Dénes	10c1f1dc80	Merge 'db: system_keyspace: use microsecond resolution for group0_history range tombstone' from Kamil Braun in `make_group0_history_state_id_mutation`, when adding a new entry to the group 0 history table, if the parameter `gc_older_than` is engaged, we create a range tombstone in the mutation which deletes entries older than the new one by `gc_older_than`. In particular if `gc_older_than = 0`, we want to delete all older entries. There was a subtle bug there: we were using millisecond resolution when generating the tombstone, while the provided state IDs used microsecond resolution. On a super fast machine it could happen that we managed to perform two schema changes in a single millisecond; this happened sometimes in `group0_test.test_group0_history_clearing_old_entries` on our new CI/promotion machines, causing the test to fail because the tombstone didn't clear the entry correspodning to the previous schema change when performing the next schema change (since they happened in the same millisecond). Use microsecond resolution to fix that. The consecutive state IDs used in group 0 mutations are guaranteed to be strictly monotonic at microsecond resolution (see `generate_group0_state_id` in service/raft/raft_group0_client.cc). Fixes #13594 Closes #13604 * github.com:scylladb/scylladb: db: system_keyspace: use microsecond resolution for group0_history range tombstone utils: UUID_gen: accept decimicroseconds in min_time_UUID	2023-04-21 14:08:56 +03:00
Kamil Braun	55f43e532c	Merge 'get rid of gms/failure_detector' from Benny Halevy Move gms::arrival_window to api/failure_detector which is its only user. and get rid of the rest, which is not used, now that we use direct_failure_detector instead. TODO: integare direct_failure_detector with failure_detector api. Closes #13576 * github.com:scylladb/scylladb: gms: get rid of unused failure_detector api: failure_detector: remove false dependency on failure_detector::arrival_window test: rest_api: add test_failure_detector	2023-04-21 11:47:44 +02:00
Kamil Braun	f9d8118c8d	db: system_keyspace: use microsecond resolution for group0_history range tombstone in `make_group0_history_state_id_mutation`, when adding a new entry to the group 0 history table, if the parameter `gc_older_than` is engaged, we create a range tombstone in the mutation which deletes entries older than the new one by `gc_older_than`. In particular if `gc_older_than = 0`, we want to delete all older entries. There was a subtle bug there: we were using millisecond resolution when generating the tombstone, while the provided state IDs used microsecond resolution. On a super fast machine it could happen that we managed to perform two schema changes in a single millisecond; this happened sometimes in `group0_test.test_group0_history_clearing_old_entries` on our new CI/promotion machines, causing the test to fail because the tombstone didn't clear the entry correspodning to the previous schema change when performing the next schema change (since they happened in the same millisecond). Use microsecond resolution to fix that. The consecutive state IDs used in group 0 mutations are guaranteed to be strictly monotonic at microsecond resolution (see `generate_group0_state_id` in service/raft/raft_group0_client.cc). Fixes #13594	2023-04-21 10:33:05 +02:00
Kefu Chai	ca6ebbd1f0	cql3, db: sstable: specialize fmt::formatter<function_name> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `function_name` without the help of `operator<<`. the corresponding `operator<<()` are dropped dropped in this change, as all its callers are now using fmtlib for formatting now. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13608	2023-04-21 10:07:28 +03:00
Benny Halevy	3f1ac846d8	gms: get rid of unused failure_detector The legacy failure_detector is now unused and can be removed. TODO: integare direct_failure_detector with failure_detector api. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-21 09:08:27 +03:00
Botond Dénes	d828cfcb23	Merge 'db, cql3: functions: switch argument passing to std::span' from Avi Kivity Database functions currently receive their arguments as an std::vector. This is inflexible (for example, one cannot use small_vector to reduce allocations). This series adapts the function signature to accept parameters using std::span. Some changes in the keys interface are needed to support this. Lastly, one call site is migrated to small_vector. This is in support of changing selectors to use expressions. Closes #13581 * github.com:scylladb/scylladb: cql3: abstract_function_selector: use small_vector for argument buffer db, cql3: functions: pass function parameters as a span instead of a vector keys: change from_optional_exploded to accept a span instead of a vector	2023-04-21 06:49:07 +03:00
Kamil Braun	3d96bc5dba	db: system_keyspace: introduce `query_mutations` with range/slice There is a `query_mutations` function which loads the entire contents of a given table into memory. There was no function for e.g. loading just a single partition in the form of mutations. Introduce one.	2023-04-20 16:36:41 +02:00
Kamil Braun	5f2b297f99	raft topology: publish new CDC generation to the user description tables Once a new CDC generation is committed to the cluster by the topology coordinator, we also need to publish it to the user-facing description tables so CDC applications know which streams to read from. This uses regular distributed table writes underneath (tables living in the `system_distributed` keyspace) so it requires `token_metadata` to be nonempty. We need a hack for the case of bootstrapping the first node in the cluster - turning the tokens into normal tokens earlier in the procedure in `token_metadata`, but this is fine for the single-node case since no streaming is happening.	2023-04-20 16:36:41 +02:00
Kamil Braun	58baf998c1	raft topology: commit a new CDC generation on node bootstrap After inserting new CDC generation data (see previous commit), we need to pick a timestamp for this generation and commit it, telling all nodes in the cluster to start using the generation for CDC log writes once their clocks cross that timestamp. We introduce a separate step to the bootstrap saga, before `write_both_read_old`, called `commit_cdc_generation`. In this step, the coordinator takes the `new_cdc_generation_data_uuid` stored in a bootstrapping node's `ring_slice` - which serves as the key to the table where the CDC generation data is stored - and combines it with a timestamp which it generates a bit into the future (as in old gossiper-based code, we use 2 * ring_delay, by default 1 minute). This gives us a CDC generation ID which we commit into the topology state as the `current_cdc_generation_id` while switching the saga to the next step, `write_both_read_old`. `system_keyspace::load_topology_state` is extended to load `current_cdc_generation_id`. For now, nodes don't react to `current_cdc_generation_id`. In later commit we'll extend `storage_service::topology_state_load` to start using the current CDC generation for CDC log table writes. The solution with specifying a timestamp into the future is the same as it is for gossip-based topology changes and it has the same consistency problem - if some node is temporarily partitioned away from the quorum, it might not learn about the new CDC generation before its clock crosses the generation's timestamp, causing it to temporarily send writes to the wrong CDC streams (until it learns about the new timestamp). I left a FIXME which describes an alternative solution which wasn't viable for gossiper-based topology changes, but it is viable when we have a fault-tolerant topology coordinator.	2023-04-20 16:36:41 +02:00
Kamil Braun	5942237a79	raft topology: create new CDC generation data during node bootstrap Calculate a new CDC generation using the bootstrapping node's tokens, translate it to mutation format, and insert this mutation to the CDC_GENERATIONS_V3 table through group 0 at the same time we assign tokens to the node in Raft topology. The partition key for this data is stored in the bootstrapping node's `ring_slice`. The data is inserted, but it's not used for anything yet, we'll do it in later commits. Two FIXMEs are left for follow-ups: - in `get_sharding_info` we shouldn't have to use the token owner's IP, but get the host ID directly from token metadata (#12279), - splitting the CDC generation data write into multiple commands. The comment elaborates.	2023-04-20 16:35:37 +02:00
Kamil Braun	22094f1509	db: system_keyspace: small refactor of `load_topology_state` The variables necessary for constructing a `ring_slice` are now living in a local block of code. This makes it easier to see which data is part of the `ring_slice` and will make it easier to add more data to `ring_slice` in following commits. Also add some more sanity checking.	2023-04-20 15:40:23 +02:00
Avi Kivity	1cd6d59578	Merge 'Remove global proxy usage from view_info::select_statement()' from Pavel Emelyanov The method needs proxy to get data_dictionary::database from to pass down to select_statement::prepare(). And a legacy bit that can come with data_dictionary::database as well. Fortunately, all the call traces that end up at select_statement() start inside table:: methods that have view_update_generator, or at view_builder::consumer that has reference to view_builder. Both services can share the database reference. However, the call traces in question pass through several code layers, so the PR adds data_dictionary::database to those layers one by one. Closes #13591 * github.com:scylladb/scylladb: view_info: Drop calls to get_local_storage_proxy() view_info: Add data_dictionary argument to select_statement() view_info: Add data_dictionary argument to partition_slice() method view_filter_checking_visitor: Construct with data_dictionary view: Carry data_dictionary arg through standalone helpers view_updates: Carry data_dictionary argument throug methods view_update_builder: Construct with data dictionary table: Push view_update_generator arg to affected_views() view: Add database getters to v._update_generator and v._builder	2023-04-20 16:40:06 +03:00
Kamil Braun	2233d8f54d	db: system_keyspace: add storage for CDC generations managed by group 0 The `CDC_GENERATIONS_V3` table schema is a copy-paste of the `CDC_GENERATIONS_V2` schema. The difference is that V2 lives in `system_distributed_keyspace` and writes to it are distributed using regular `storage_proxy` replication mechanisms based on the token ring. The V3 table lives in `system_keyspace` and any mutations written to it will go through group 0. Also extend the `TOPOLOGY` schema with new columns: - `new_cdc_generation_data_uuid` will be stored as part of a bootstrapping node's `ring_slice`, it stores UUID of a newly introduced CDC generation which is used as partition key for the `CDC_GENERATIONS_V3` table to access this new generation's data. It's a regular column, meaning that every row (corresponding to a node) will have its own. - `current_cdc_generation_uuid` and `current_cdc_generation_timestamp` together form the ID of the newest CDC generation in the cluster. (the uuid is the data key for `CDC_GENERATIONS_V3`, the timestamp is when the CDC generation starts operating). Those are static columns since there's a single newest CDC generation.	2023-04-20 15:38:58 +02:00
Kamil Braun	1e9cf3badd	cdc: generation: `get_cdc_generation_mutations`: take timestamp as parameter The function would generate a mutation timestamp for itself, take it as parameter instead. We'll use timestamps provided by Group 0 APIs when creating CDC generations during Group 0- based topology changes.	2023-04-20 15:38:37 +02:00
Kamil Braun	3e863d0e58	sys_dist_ks: make `get_cdc_generation_mutations` public It was a `static` function inside system_distributed_keyspace. Later it will be used for another table living in system_keyspace, so move it outside, to the CDC generations module, and make it accessible from other places.	2023-04-20 15:38:37 +02:00
Kamil Braun	ed133db709	sys_dist_ks: move find_schema outside `get_cdc_generation_mutations` The function will be reused for a different table.	2023-04-20 15:38:37 +02:00
Kamil Braun	0e84662910	sys_dist_ks: move mutation size threshold calculation outside `get_cdc_generation_mutations` The function turns a `cdc::topology_description` into a vector of mutations. It decides when to push_back a new mutation (instead of extending an existing one) based on certain parameters. This calculation is specific to where we insert the mutation later. Move the calculation outside, to the function which does the insertion. `get_cdc_generation_mutations` will be used outside this function later.	2023-04-20 15:38:37 +02:00
Pavel Emelyanov	bda2aea5be	view: Get topology via database tokens The view_builder::view_build_statuses() needs topology to walk its nodes. Now it gets one from global proxy via its token metadata, but database also has tokens and view_builder has reference to database. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 13:18:14 +03:00
Pavel Emelyanov	403463d7eb	view: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 13:18:14 +03:00
Pavel Emelyanov	257814f443	view: Coroutinuze view_builder::view_build_statuses() Easier to patch it this way further. Indentation is deliberately left broken until next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 13:17:59 +03:00
Pavel Emelyanov	edcce7d8dd	view_info: Drop calls to get_local_storage_proxy() In both cases the proxy is called to get data_dictionary from. Now its available as the call argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:46 +03:00
Pavel Emelyanov	3e4fb7cad6	view_info: Add data_dictionary argument to select_statement() This method needs data_dictionary to work. Fortunately, all callers of it already have the dictionary at hand and can just pass it as argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:46 +03:00
Pavel Emelyanov	4375835cdd	view_info: Add data_dictionary argument to partition_slice() method The caller is calculate_affected_clustering_ranges() with dictionary arg, the method needs dictionary to call view_info::select_statement() later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:46 +03:00
Pavel Emelyanov	0aff55cdb2	view_filter_checking_visitor: Construct with data_dictionary The visitor is wait-free helper for matches_view_filter() that has dictionary as its argument. Later the visitor will pass the dictionary to view_info::select_statement(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:46 +03:00
Pavel Emelyanov	837fde84b1	view: Carry data_dictionary arg through standalone helpers There's a bunch of functions in view.{hh\|cc} that don't belong to any class and perform view-related claculations for view updates. Lots of them eventually call view_info::select_statement() which will later need the dictionary. By now all those methods' callers have data dictionary at hand and can share it via argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:46 +03:00
Pavel Emelyanov	1301a99ba3	view_updates: Carry data_dictionary argument throug methods The goal is to have the dictionary at places that later wrap calls to view_info::select_statement(). This graph of calls starts at the only public view_updates::generate_update() method which, in turn, is called from view_update_builder that already has data dictionary at hand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:46 +03:00
Pavel Emelyanov	9d3d533561	view_update_builder: Construct with data dictionary The caller is table with view-update-generator at hand (it calls mutate_MV on). Builder here is used as a temporary object that destroys once the caller coroutine co_return-s, so keeping the database obtained from the view-update-generator is safe. Later the v.u.b. object will propagate its data dictionary down the callstacks. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 11:17:38 +03:00
Pavel Emelyanov	7ddcd0c918	view: Add database getters to v._update_generator and v._builder Both services carry database which will be used by auxiliary objects like view_updates, view_update_builder, consumer, etc in next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-20 10:41:16 +03:00
Avi Kivity	3e0aacc8b5	db, cql3: functions: pass function parameters as a span instead of a vector Spans are more flexible and can be constructed from any contiguous container (such as small_vector), or a subrange of such a container. This can save allocations, so change the signature to accept a span. Spans cannot be constructed from std::initializer_list, so one such call site is changed to use construct a span directly from the single argument.	2023-04-19 20:38:55 +03:00
Pavel Emelyanov	9628d07adb	Put storage_service.hh on a diet By removing unneeded headers inclusions. At the cost of few more forward declarations and a couple of extra includes in other .cc files. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13552	2023-04-18 14:53:17 +03:00
Tomasz Grabiec	a8f8f9f0ea	Merge 'raft topology: store `shard_count` and `ignore_msb` in topology' from Kamil Braun Add new columns to the `system.topology` table: `shard_count` and `ignore_msb`. When a node bootstraps or restarts and observes that the values stored in `topology` are different than the local values, it updates them. This is done in the `update_topology_with_local_metadata` function (the 'metadata' here being the two values). Additional flag persisted in `system.scylla_local` is used to safely avoid performing read barriers when the values didn't change on node restart. A comment in `update_topology_with_local_metadata` explains why this flag is needed. An example use case where `shard_count` and `ignore_msb` are needed is creating CDC generations. Fixes: #13508 Closes #13521 * github.com:scylladb/scylladb: raft topology: update `release_version` in topology on restart raft topology: store `shard_count` and `ignore_msb` in topology	2023-04-18 01:18:50 +02:00
Kamil Braun	f9051dccaa	raft topology: store `shard_count` and `ignore_msb` in topology Add new columns to the `system.topology` table: `shard_count` and `ignore_msb`. When a node bootstraps or restarts and observes that the values stored in `topology` are different than the local values, it updates them. This is done in the `update_topology_with_local_metadata` function (the 'metadata' here being the two values). Additional flag persisted in `system.scylla_local` is used to safely avoid performing read barriers when the values didn't change on node restart. A comment in `update_topology_with_local_metadata` explains why this flag is needed. An example use case where `shard_count` and `ignore_msb` are needed is creating CDC generations. Fixes: #13508	2023-04-17 10:45:30 +02:00
Botond Dénes	4c37dc5507	Merge 'keys: specialize fmt::formatter<partition_key> and friends' from Kefu Chai this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print following classes without the help of `operator<<`. - partition_key_view - partition_key - partition_key::with_schema_wrapper - key_with_schema - clustering_key_prefix - clustering_key_prefix::with_schema_wrapper the corresponding `operator<<()` are dropped dropped in this change, as all its callers are now using fmtlib for formatting now. the helper of `print_key()` is removed, as its only caller is `operator<<(std::ostream&, const clustering_key_prefix::with_schema_wrapper&)`. the reason why all these operators are replaced in one go is that we have a template function of `key_to_str()` in `db/large_data_handler.cc`. this template function is actually the caller of operator<< of `partition_key::with_schema_wrapper` and `clustering_key_prefix::with_schema_wrapper`. so, in order to drop either of these two operator<<, we need to remove both of them, so that we can switch over to `fmt::to_string()` in this template function. Refs scylladb#13245 Closes #13513 * github.com:scylladb/scylladb: keys: consolidate the formatter for partition_keys keys: specialize fmt::formatter<partition_key> and friends	2023-04-17 10:27:31 +03:00
Benny Halevy	490a0ae89b	schema_tables: call seastar::current_backtrace only when log_level is enabled `seastar::current_backtrace()` can be quite heavey. When we pass it to a log message in relatively detailed log_level (debug/trace), we pay the price of `current_backtrace` every time, but we rarely print the message. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-16 14:22:06 +03:00
Tomasz Grabiec	952b455310	Merge ' tool/scylla-sstable: more flexibility in obtaining the schema' from Botond Dénes scylla-sstable currently has two ways to obtain the schema: * via a `schema.cql` file. * load schema definition from memory (only works for system tables). This meant that for most cases it was necessary to export the schema into a CQL format and write it to a file. This is very flexible. The sstable can be inspected anywhere, it doesn't have to be on the same host where it originates form. Yet in many cases the sstable is inspected on the same host where it originates from. In this cases, the schema is readily available in the schema tables on disk and it is plain annoying to have to export it into a file, just to quickly inspect an sstable file. This series solves this annoyance by providing a mechanism to load schemas from the on-disk schema tables. Furthermore, an auto-detect mechanism is provided to detect the location of these schema tables based on the path of the sstable, but if that fails, the tool check the usual locations of the scylla data dir, the scylla confguration file and even looks for environment variables that tell the location of these. The old methods are still supported. In fact, if a schema.cql is present in the working directory of the tool, it is preferred over any other method, allowing for an easy force-override. If the auto-detection magic fails, an error is printed to the console, advising the user to turn on debug level logging to see what went wrong. A comprehensive test is added which checks all the different schema loading mechanisms. The documentation is also updated to reflect the changes. This change breaks the backward-compatibility of the command-line API of the tool, as `--system-schema` is now just a flag, the keyspace and table names are supplied separately via the new `--keyspace` and `--table` options. I don't think this will break anybody's workflow as this tools is still lightly used, exactly because of the annoying way the schema has to be provided. Hopefully after this series, this will change. Example: ``` $ ./build/dev/scylla sstable dump-data /var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine/me-1-big-Data.db {"sstables":{"/var/lib/scylla/data/ks/tbl2-d55ba230b9a811ed9ae8495671e9e4f8/quarantine//me-1-big-Data.db":[{"key":{"token":"-3485513579396041028","raw":"000400000000","value":"0"},"clustering_elements":[{"type":"clustering-row","key":{"raw":"","value":""},"marker":{"timestamp":1677837047297728},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1677837047297728,"value":"0"}}}]}]}} ``` As seen above, subdirectories like qurantine, staging etc are also supported. Fixes: https://github.com/scylladb/scylladb/issues/10126 Closes #13448 * github.com:scylladb/scylladb: test/cql-pytest: test_tools.py: add tests for schema loading test/cql-pytest: add no_autocompaction_context docs: scylla-sstable.rst: remove accidentally added copy-pasta docs: scylla-sstable.rst: remove paragraph with schema limitations docs: scylla-sstable.rst: update schema section test/cql-pytest: nodetool.py: add flush_keyspace() tools/scylla-sstable: reform schema loading mechanism tools/schema_loader: add load_schema_from_schema_tables() db/schema_tables: expose types schema	2023-04-14 16:46:26 +02:00
Kefu Chai	3738fcbe05	keys: specialize fmt::formatter<partition_key> and friends this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print following classes without the help of `operator<<`. - partition_key_view - partition_key - partition_key::with_schema_wrapper - key_with_schema - clustering_key_prefix - clustering_key_prefix::with_schema_wrapper the corresponding `operator<<()` are dropped dropped in this change, as all its callers are now using fmtlib for formatting now. the helper of `print_key()` is removed, as its only caller is `operator<<(std::ostream&, const clustering_key_prefix::with_schema_wrapper&)`. the reason why all these operators are replaced in one go is that we have a template function of `key_to_str()` in `db/large_data_handler.cc`. this template function is actually the caller of operator<< of `partition_key::with_schema_wrapper` and `clustering_key_prefix::with_schema_wrapper`. so, in order to drop either of these two operator<<, we need to remove both of them, so that we can switch over to `fmt::to_string()` in this template function. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-14 13:21:30 +08:00
Pavel Emelyanov	097cea11b2	view: Remove unused view_ptr reference After previous patch the value_getter::_view becomes unused and can be dropped. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-13 16:51:27 +03:00
Pavel Emelyanov	821c8b19a6	view: Carry backing-secondary-index bit via view builder When view builder constructs it populates itself with view updates. Later the updates may instantiate the value_getter-s which, in turn, would need to check if the view is backing secondary index. Good news is that when view builder constructs it has all the information at hand needed to evaluate this "backing" bit. It's then propagated down to value_getter via corresponding view_updates. The getter's _view field becomes unused after this change and is (void)-ed to make this patch compile. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-13 16:48:36 +03:00
Pavel Emelyanov	e8b5022343	view: Keep backing-seconday-index bool on value_getter The getter needs to check if the view is backing a secondary index. Currentl it's done inside the handle_computed_column() method, but it's more convenient if this bit is known during construction, so move it there. There are no places that can change this property between view_getter is created and the method in question is called. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-13 16:45:59 +03:00
Botond Dénes	5d0c0ae0c4	Merge 'token_metadata: use topology nodes for endpoint_to_host_id map' from Benny Halevy Currently, token_metadata_impl maintains a "shadow" endpoint to host_id map on top of the maps in topology. This series first reimplements the functions that currently use this map to use topology instead. Then the important users of `get_endpoint_to_host_id_map_for_reading`: node_ops_ctl and view_builder and converted to use a new `topology::for_each_node` function to process all nodes in topology directly, without going through `get_endpoint_to_host_id_map_for_reading`. Closes #13476 * github.com:scylladb/scylladb: view_builder: view_build_statuses: use topology::for_each_node storage_service: node_ops_ctl: refresh_sync_nodes: use topology::for_each_node topology: add for_each_node token_metadata: get endpoint to node map from topology	2023-04-12 10:33:02 +03:00
Botond Dénes	63b266a988	db/schema_tables: expose types schema	2023-04-12 02:43:53 -04:00

1 2 3 4 5 ...

3045 Commits