scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 18:40:38 +00:00

Author	SHA1	Message	Date
Kamil Braun	4a52b802ac	test: unit test for group 0 concurrent change protection and CQL DDL retries Check that group 0 history grows iff a schema change does not throw `group0_concurrent_modification`. Check that the CQL DDL statement retry mechanism works as expected.	2022-01-27 11:26:15 +01:00
Kamil Braun	edd8344706	cql3: statements: schema_altering_statement: automatically retry in presence of concurrent changes Schema changes on top of Raft do not allow concurrent changes. If two changes are attempted concurrently, one of them gets `group0_concurrent_modification` exception. Catch the exception in CQL DDL statement execution function and retry. In addition, the description of CQL DDL statements in group 0 history table was improved.	2022-01-27 11:26:14 +01:00
Tomasz Grabiec	ba6c02b38a	Merge "Clear old entries from group 0 history when performing schema changes" from Kamil When performing a change through group 0 (which right now means schema changes), clear entries from group 0 history table which are older than one week. This is done by including an appropriate range tombstone in the group 0 history table mutation. * kbr/g0-history-gc-v2: idl: group0_state_machine: fix license blurb test: unit test for clearing old entries in group0 history service: migration_manager: clear old entries from group 0 history when announcing	2022-01-26 16:12:40 +01:00
Avi Kivity	df22396a34	Merge 'scylla_raid_setup: use mdmonitor only when RAID level > 0' from Takuya ASADA We found that monitor mode of mdadm does not work on RAID0, and it is not a bug, expected behavior according to RHEL developer. Therefore, we should stop enabling mdmonitor when RAID0 is specified. Fixes #9540 ---- This reverts `0d8f932` and introduce correct fix. Closes #9970 * github.com:scylladb/scylla: scylla_raid_setup: use mdmonitor only when RAID level > 0 Revert "scylla_raid_setup: workaround for mdmonitor.service issue on CentOS8"	2022-01-26 15:34:47 +02:00
Takuya ASADA	32f2eb63ac	scylla_raid_setup: use mdmonitor only when RAID level > 0 We found that monitor mode of mdadm does not work on RAID0, and it is not a bug, expected behavior according to RHEL developer. Therefore, we should stop enabling mdmonitor when RAID0 is specified. Fixes #9540	2022-01-26 22:33:07 +09:00
Takuya ASADA	cd57815fff	Revert "scylla_raid_setup: workaround for mdmonitor.service issue on CentOS8" This reverts commit `0d8f932f0b`, because RHEL developer explains this is not a bug, it's expected behavior. (mdadm --monitor does not start when RAID level is 0) see: https://bugzilla.redhat.com/show_bug.cgi?id=2031936 So we should stop downgrade mdadm package and modify our script not to enable mdmonitor.service on RAID0, use it only for RAID5.	2022-01-26 22:33:06 +09:00
Gleb Natapov	579dcf187a	raft: allow an option to persist commit index Raft does not need to persist the commit index since a restarted node will either learn it from an append message from a leader or (if entire cluster is restarted and hence there is no leader) new leader will figure it out after contacting a quorum. But some users may want to be able to bring their local state machine to a state as up-to-date as it was before restart as soon as possible without any external communication. For them this patch introduces new persistence API that allows saving and restoring last seen committed index. Message-Id: <YfFD53oS2j1My0p/@scylladb.com>	2022-01-26 14:06:39 +01:00
Calle Wilund	43f51e9639	commitlog: Ensure we don't run continuation (task switch) with queues modified Fixes #9955 In #9348 we handled the problem of failing to delete segment files on disk, and the need to recompute disk footprint to keep data flow consistent across intermittent failures. However, because _reserve_segments and _recycled_segments are queues, we have to empty them to inspect the contents. One would think it is ok for these queues to be empty for a while, whilst we do some recaclulating, including disk listing -> continuation switching. But then one (i.e. I) misses the fact that these queues use the pop_eventually mechanism, which does _not_ handle a scenario where we push something into an empty queue, thus triggering the future that resumes a waiting task, but then pop the element immediately, before the waiting task is run. In fact, _iff_ one does this, not only will things break, they will in fact start creating undefined behaviour, because the underlying std::queue<T, circular_buffer> will _not_ do any bounds checks on the pop/push operations -> we will pop an empty queue, immediately making it non-empty, but using undefined memory (with luck null/zeroes). Strictly speakging, seastar::queue::pop_eventually should be fixed to handle the scenario, but nontheless we can fix the usage here as well, by simply copy objects and do the calculation "in background" while we potentially start popping queue again. Closes #9966	2022-01-26 13:51:01 +02:00
Avi Kivity	f5cd6ec419	Update tools/python3 submodule (relicensed to Apache License 2.0) * tools/python3 8a77e76...f725ec7 (2): > Relicense to Apache 2.0 > treewide: use Software Package Data Exchange (SPDX) license identifiers	2022-01-25 18:50:39 +02:00
Kamil Braun	f3c0c73d36	idl: group0_state_machine: fix license blurb	2022-01-25 17:48:46 +01:00
Kamil Braun	bf91dcd1e3	idl: group0_state_machine: fix license blurb	2022-01-25 13:14:47 +01:00
Kamil Braun	b863a63b08	test: unit test for clearing old entries in group0 history We perform a bunch of schema changes with different values of `migration_manager::_group0_history_gc_duration` and check if entries are cleared according to this setting.	2022-01-25 13:13:35 +01:00
Kamil Braun	e9083433a8	service: migration_manager: clear old entries from group 0 history when announcing When performing a change through group 0 (which right now only covers schema changes), clear entries from group 0 history table which are older than one week. This is done by including an appropriate range tombstone in the group 0 history table mutation.	2022-01-25 13:11:14 +01:00
Botond Dénes	eb42213db4	compact_mutation: close active range tombstone on page end The compactor recently acquired the ability to consume a v2 stream. The v2 spec requires that all streams end with a null tombstone. `range_tombstone_assembler`, the component the compactor uses for converting the v2 input into its v1 output enforces this with a check on `consume_end_of_partition()`. Normally the producer of the stream the compactor is consuming takes care of closing the active tombstone before the stream ends. The compactor however (or its consumer) can decide to end the consume early, e.g. to cut the current page. When this happens the compactor must take care of closing the tombstone itself. Furthermore it has to keep this tombstone around to re-open it on the next page. This patch implements this mechanism which was left out of `134601a15e`. It also adds a unit test which reproduces the problems caused by the missing mechanism. The compactor now tracks the last clustering position emitted. When the page ends, this position will be used as the position of the closing range tombstone change. This ensures the range tombstone only covers the actually emitted range. Fixes: #9907 Tests: unit(dev), dtest(paging_test.py, paging_additional_test.py) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20220114053215.481860-1-bdenes@scylladb.com>	2022-01-25 09:52:30 +02:00
Gleb Natapov	e56e96ac5a	raft: do not add new wait entries after abort Abort signals stopped_error on all awaited entries, but if an entry is added after this it will be destroyed without signaling and will cause a waiter to get broken_promise. Fixes #9688 Message-Id: <Ye6xJjTDooKSuZ87@scylladb.com>	2022-01-25 09:52:30 +02:00
Tomasz Grabiec	c89b1953f8	Merge "Enforce linearizability of group 0 operations using state IDs" from Kamil We introduce a new table, `system.group0_history`. This table will contain a history of all group 0 changes applied through Raft. With each change is an associated unique ID, which also identifies the state of all group 0 tables (including schema tables) after this change is applied, assuming that all such changes are serialized through Raft (they will be eventually). Group 0 commands, additionally to mutations which modify group 0 tables, contain a "previous state ID" and a "new state ID". The group 0 state machine will only modify state during command application if the provided "previous state ID" is equal to the last state ID present in the history table. Otherwise, the command will be a no-op. To ensure linearizability of group 0 changes, the performer of the change must first read the last state ID, only then read the state and send a command for the state machine. If a concurrent change races with this command and manages to modify the state, we will detect that the last state ID does not match during `apply`; all calls to `apply` are serialized, and `apply` adds the new entry to the history table at the end, after modifying the group 0 state. The details of this mechanism are abstracted away with `group0_guard`. To perform a group 0 change, one needs to call `announce`, which requires a `group0_guard` to be passed in. The only way to obtain a `group0_guard` is by calling `start_group0_operation`, which underneath performs a read barrier on group 0, obtains the last state ID from the history table, and constructs a new state ID that the change will append to the history table. The read barrier ensures that all previously completed changes are visible to this operation. The caller can then perform any necessary validation, construct mutations which modify group 0 state, and finally call `announce`. The guard also provides a timestamp which is used by the caller to construct the mutations. The timestamp is obtained from the new state ID. We ensure that it is greater than the timestamp of the last state ID. Thus, if the change is successful, the applied mutations will have greater timestamps than the previously applied mutations. We also add two locks. The more important one, used to ensure correctness, is `read_apply_mutex`. It is held when modifying group 0 state (in `apply` and `transfer_snapshot`) and when reading it (it's taken when obtaining a `group0_guard` and released before a command is sent in `announce`). Its goal is to ensure that we don't read partial state, which could happen without it because group 0 state consist of many parts and `apply` (or `transfer_snapshot`) potentially modifies all of them. Note: this doesn't give us 100% protection; if we crash in the middle of `apply` (or `transfer_snapshot`), then after restart we may read partial state. To remove this possibility we need to ensure that commands which were being applied before restart but not finished are re-applied after restart, before anyone can read the state. I left a TODO in `apply`. The second lock, `operation_mutex`, is used to improve liveness. It is taken when obtaining a `group0_guard` and released after a command is applied (compare to `read_apply_mutex` which is released before a command is sent). It is not taken inside `apply` or `transfer_snapshot`. This lock ensures that multiple fibers running on the same node do not attempt to modify group0 concurrently - this would cause some of them to fail (due to the concurrent modification protection described above). This is mostly important during first boot of the first node, when services start for the first time and try to create their internal tables. This lock serializes these attempts, ensuring that all of them succeed. * kbr/schema-state-ids-v4: service: migration_manager: `announce`: take a description parameter service: raft: check and update state IDs during group 0 operations service: raft: group0_state_machine: introduce `group0_command` service: migration_manager: allow using MIGRATION_REQUEST verb to fetch group 0 history table service: migration_manager: convert migration request handler to coroutine db: system_keyspace: introduce `system.group0_history` table treewide: require `group0_guard` when performing schema changes service: migration_manager: introduce `group0_guard` service: raft: pass `storage_proxy&` to `group0_state_machine` service: raft: raft_state_machine: pass `snapshot_descriptor` to `transfer_snapshot` service: raft: rename `schema_raft_state_machine` to `group0_state_machine` service: migration_manager: rename `schema_read_barrier` to `start_group0_operation` service: migration_manager: `announce`: split raft and non-raft paths to separate functions treewide: pass mutation timestamp from call sites into `migration_manager::prepare_*` functions service: migration_manager: put notifier call inside `async` service: migration_manager: remove some unused and disabled code db: system_distributed_keyspace: use current time when creating mutations in `start()` redis: keyspace_utils: `create_keyspace_if_not_exists_impl`: call `announce` twice only	2022-01-25 09:52:30 +02:00
Avi Kivity	a105b09475	build: prepare for Scylla 5.0 We decided to name the next version Scylla 5.0, in honor of Raft based schema management.	2022-01-25 09:52:30 +02:00
Avi Kivity	277303a722	build_indexes_virtual_reader: convert to flat_mutation_reader_v2 Since it doesn't handle range tombstones in any way, the conversion consists of just using the new type names. Closes #9948	2022-01-25 09:52:30 +02:00
Avi Kivity	007145e033	validation: complete transition to data_dictionary module The API was converted in `00de5f4876`, but some #includes remain. Remove them. Closes #9947	2022-01-25 09:52:30 +02:00
Avi Kivity	e74f570eda	alternator: streams: fix use-after-free of data_dictionary in describe_stream() In `4aa9e86924` ("Merge 'alternator: move uses of replica module to data_dictionary' from Avi Kivity"), we changed alternator to use data_dictionary instead of replica::database. However, data_dictionary::database objects are different from replica::database objects in that they don't have a stable address and need to be captured by value (they are pointer-like). One capture in describe_stream() was capturing a data_dictionary::database by reference and so caused a use-after-free when the previous continuation was deallocated. Fix by capturing by value. Fixes #9952. Closes #9954	2022-01-25 09:52:30 +02:00
Kamil Braun	044e05b0d9	service: migration_manager: `announce`: take a description parameter The description parameter is used for the group 0 history mutation. The default is empty, in which case the mutation will leave the description column as `null`. I filled the parameter in some easy places as an example and left the rest for a follow-up. This is how it looks now in a fresh cluster with a single statement performed by the user: cqlsh> select * from system.group0_history ; key \| state_id \| description ---------+--------------------------------------+------------------------------------------------------ history \| 9ec29cac-7547-11ec-cfd6-77bb9e31c952 \| CQL DDL statement history \| 9beb2526-7547-11ec-7b3e-3b198c757ef2 \| null history \| 9be937b6-7547-11ec-3b19-97e88bd1ca6f \| null history \| 9be784ca-7547-11ec-f297-f40f0073038e \| null history \| 9be52e14-7547-11ec-f7c5-af15a1a2de8c \| null history \| 9be335dc-7547-11ec-0b6d-f9798d005fb0 \| null history \| 9be160c2-7547-11ec-e0ea-29f4272345de \| null history \| 9bdf300e-7547-11ec-3d3f-e577a2e31ffd \| null history \| 9bdd2ea8-7547-11ec-c25d-8e297b77380e \| null history \| 9bdb925a-7547-11ec-d754-aa2cc394a22c \| null history \| 9bd8d830-7547-11ec-1550-5fd155e6cd86 \| null history \| 9bd36666-7547-11ec-230c-8702bc785cb9 \| Add new columns to system_distributed.service_levels history \| 9bd0a156-7547-11ec-a834-85eac94fd3b8 \| Create system_distributed(_everywhere) tables history \| 9bcfef18-7547-11ec-76d9-c23dfa1b3e6a \| Create system_distributed_everywhere keyspace history \| 9bcec89a-7547-11ec-e1b4-34e0010b4183 \| Create system_distributed keyspace	2022-01-24 15:20:37 +01:00
Kamil Braun	6a00e790c7	service: raft: check and update state IDs during group 0 operations The group 0 state machine will only modify state during command application if the provided "previous state ID" is equal to the last state ID present in the history table. Otherwise, the command will be a no-op. To ensure linearizability of group 0 changes, the performer of the change must first read the last state ID, only then read the state and send a command for the state machine. If a concurrent change races with this command and manages to modify the state, we will detect that the last state ID does not match during `apply`; all calls to `apply` are serialized, and `apply` adds the new entry to the history table at the end, after modifying the group 0 state. The details of this mechanism are abstracted away with `group0_guard`. To perform a group 0 change, one needs to call `announce`, which requires a `group0_guard` to be passed in. The only way to obtain a `group0_guard` is by calling `start_group0_operation`, which underneath performs a read barrier on group 0, obtains the last state ID from the history table, and constructs a new state ID that the change will append to the history table. The read barrier ensures that all previously completed changes are visible to this operation. The caller can then perform any necessary validation, construct mutations which modify group 0 state, and finally call `announce`. The guard also provides a timestamp which is used by the caller to construct the mutations. The timestamp is obtained from the new state ID. We ensure that it is greater than the timestamp of the last state ID. Thus, if the change is successful, the applied mutations will have greater timestamps than the previously applied mutations. We also add two locks. The more important one, used to ensure correctness, is `read_apply_mutex`. It is held when modifying group 0 state (in `apply` and `transfer_snapshot`) and when reading it (it's taken when obtaining a `group0_guard` and released before a command is sent in `announce`). Its goal is to ensure that we don't read partial state, which could happen without it because group 0 state consist of many parts and `apply` (or `transfer_snapshot`) potentially modifies all of them. Note: this doesn't give us 100% protection; if we crash in the middle of `apply` (or `transfer_snapshot`), then after restart we may read partial state. To remove this possibility we need to ensure that commands which were being applied before restart but not finished are re-applied after restart, before anyone can read the state. I left a TODO in `apply`. The second lock, `operation_mutex`, is used to improve liveness. It is taken when obtaining a `group0_guard` and released after a command is applied (compare to `read_apply_mutex` which is released before a command is sent). It is not taken inside `apply` or `transfer_snapshot`. This lock ensures that multiple fibers running on the same node do not attempt to modify group0 concurrently - this would cause some of them to fail (due to the concurrent modification protection described above). This is mostly important during first boot of the first node, when services start for the first time and try to create their internal tables. This lock serializes these attempts, ensuring that all of them succeed.	2022-01-24 15:20:37 +01:00
Kamil Braun	509ac2130f	service: raft: group0_state_machine: introduce `group0_command` Objects of this type will be serialized and sent as commands to the group 0 state machine. They contain a set of mutations which modify group 0 tables (at this point: schema tables and group 0 history table), the 'previous state ID' which is the last state ID present in the history table when the operation described by this command has started, and the 'new state ID' which will be appended to the history table if this change is successful (successful = the previous state ID is still equal to the last state ID in the history table at the moment of application). It also contains the address of the node which constructed this command. The state ID mechanism will be described in more detail in a later commit.	2022-01-24 15:20:37 +01:00
Kamil Braun	cc0c54ea15	service: migration_manager: allow using MIGRATION_REQUEST verb to fetch group 0 history table The MIGRATION_REQUEST verb is currently used to pull the contents of schema tables (in the form of mutations) when nodes synchronize schemas. We will (ab)use the verb to fetch additional data, such as the contents of the group 0 history table, for purposes of group 0 snapshot transfer. We extend `schema_pull_options` with a flag specifying that the puller requests the additional data associated with group 0 snapshots. This flag is `false` by default, so existing schema pulls will do what they did before. If the flag is `true`, the migration request handler will include the contents of group 0 history table. Note that if a request is set with the flag set to `true`, that means the entire cluster must have enabled the Raft feature, which also means that the handler knows of the flag.	2022-01-24 15:20:37 +01:00
Kamil Braun	a944dd44ee	service: migration_manager: convert migration request handler to coroutine	2022-01-24 15:20:37 +01:00
Kamil Braun	fad72daeb4	db: system_keyspace: introduce `system.group0_history` table This table will contain a history of all group 0 changes applied through Raft. With each change is an associated unique ID, which also identifies the state of all group 0 tables (including schema tables) after this change is applied, assuming that all such changes are serialized through Raft (they will be eventually). We will use these state IDs to check if a given change is still valid at the moment it is applied (in `group0_state_machine::apply`), i.e. that there wasn't a concurrent change that happened between creating this change and applying it (which may invalidate it).	2022-01-24 15:20:37 +01:00
Kamil Braun	a664ac7ba5	treewide: require `group0_guard` when performing schema changes `announce` now takes a `group0_guard` by value. `group0_guard` can only be obtained through `migration_manager::start_group0_operation` and moved, it cannot be constructed outside `migration_manager`. The guard will be a method of ensuring linearizability for group 0 operations.	2022-01-24 15:20:35 +01:00
Kamil Braun	742f036261	service: migration_manager: introduce `group0_guard` This object will be used to "guard" group 0 operations. Obtaining it will be necessary to perform a group 0 change (such as modifying the schema), which will be enforced by the type system. The initial implementation is a stub and only provides a timestamp which will be used by callers to create mutations for group 0 changes. The next commit will change all call sites to use the guard as intended. The final implementation, coming later, will ensure linearizability of group 0 operations.	2022-01-24 15:12:50 +01:00
Kamil Braun	f908da919c	service: raft: pass `storage_proxy&` to `group0_state_machine` We'll use it to update the group 0 history table.	2022-01-24 15:12:50 +01:00
Kamil Braun	dce8ece4b6	service: raft: raft_state_machine: pass `snapshot_descriptor` to `transfer_snapshot` Currently it takes just the snapshot ID. Extend it by taking the whole snapshot descriptor. In following commits I use this to perform additional logging.	2022-01-24 15:12:50 +01:00
Kamil Braun	538cc6ecb9	service: raft: rename `schema_raft_state_machine` to `group0_state_machine` Generalize the name so it doesn't suggest that group 0 contains only schema state.	2022-01-24 15:12:50 +01:00
Kamil Braun	86762a1dd9	service: migration_manager: rename `schema_read_barrier` to `start_group0_operation` 1. Generalize the name so it mentions group 0, which schema will be a strict subset of. 2. Remove the fact that it performs a "read barrier" from the name. The function will be used in general to ensure linearizability of group0 operations - both reads and writes. "Read barrier" is Raft-specific terminology, so it can be thought of as an implementation detail.	2022-01-24 15:12:50 +01:00
Kamil Braun	0f24b907b7	service: migration_manager: `announce`: split raft and non-raft paths to separate functions	2022-01-24 15:12:50 +01:00
Kamil Braun	283ac7fefe	treewide: pass mutation timestamp from call sites into `migration_manager::prepare_*` functions The functions which prepare schema change mutations (such as `prepare_new_column_family_announcement`) would use internally generated timestamps for these mutations. When schema changes are managed by group 0 we want to ensure that timestamps of mutations applied through Raft are monotonic. We will generate these timestamps at call sites and pass them into the `prepare_` functions. This commit prepares the APIs.	2022-01-24 15:12:50 +01:00
Kamil Braun	f97edb1dbd	service: migration_manager: put notifier call inside `async` `get_notifier().before_update_column_family(...)` requires being inside `async`. Fix this.	2022-01-24 15:12:50 +01:00
Kamil Braun	3bab5c564a	service: migration_manager: remove some unused and disabled code `include_keyspace_and_announce` was no longer used. `do_announce_new_type` only had a declaration, it was not used and there was no definition.	2022-01-24 15:12:49 +01:00
Kamil Braun	0af5f74871	db: system_distributed_keyspace: use current time when creating mutations in `start()` When creating or updating internal distributed tables in `system_distributed_keyspace::start()`, hardcoded timestamps were used. There two reasons for this: - to protect against issue #2129, where nodes would start without synchronizing schema with the existing cluster, creating the tables again, which would override any manual user changes to these tables. The solution was to use small timestamps (like api::min_timestamp) - the user-created schema mutations would always 'win' (because when they were created, they used current time). - to eliminate unnecessary schema sync. If two nodes created these tables concurrently with different timestamps, the schemas would formally be different and would need to merge. This could happen during upgrades when we upgraded from a version which doesn't have these tables or doesn't have some columns. The #2129 workaround is no longer necessary: when nodes start they always have to sync schema with existing nodes; we also don't allow bootstrapping nodes in parallel. The second problem would happen during parallel bootstrap, which we don't allow, or during parallel upgrade. The procedure we recommend is rolling upgrade - where nodes are upgraded one by one. In this case only one node is going to create/update the tables; following upgraded nodes will sync schema first and notice they don't need to do anything. So if procedures are followed correctly, the workaround is not needed. If someone doesn't follow the procedures and upgrades nodes in parallel, these additional schema synchronizations are not a big cost, so the workaround doesn't give us much in this case as well. When schema changes are performed by Raft group 0, certain constraints are placed on the timestamps used for mutations. For this we'll need to be able to use timestamps which are generated based on current time.	2022-01-24 15:12:49 +01:00
Kamil Braun	63d3449bc3	redis: keyspace_utils: `create_keyspace_if_not_exists_impl`: call `announce` twice only The code would previously `announce` schema mutations once per each keyspace and once per each table. This can be reduced to two calls of `announce`: once to create all keyspaces, and once to create all tables. This should be further reduced to a single `announce` in the future. Left a FIXME. Motivation: after migrating to Raft, each `announce` will require a `read_barrier` to achieve linearizability of schema operations. This introduces latency, as it requires contacting a leader which then must contact a quorum. The fewer announce calls, the better. Also, if all sub-operations are reduced to a single `announce`, we get atomicity - either all of these sub-operations succeed or none do.	2022-01-24 15:12:46 +01:00
Benny Halevy	188cedd533	test: lister_test: test_lister_abort: generate at least one entry Without this fix, generate_random_content could generate 0 entries and the expected exception would never be injected. With it, we generate at least 1 entry and the test passes with the offending random-seed: ``` random-seed=1898914316 Generated 1 dir entries Aborting lister after 1 dir entries test/boost/lister_test.cc(96): info: check 'exception "expected_exception" raised as expected' has passed ``` Fixes #9953 Test: lister_test.test_lister_abort --random-seed=1898914316(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220123122921.14017-1-bhalevy@scylladb.com>	2022-01-23 17:52:44 +02:00
Gleb Natapov	d09864d61f	redis: check for tables existence before creating Do not create redis tables unconditionally on boot since this requires issue raft barrier and cannot be done without a quorum. Message-Id: <YefV0CqEueRL7G00@scylladb.com>	2022-01-23 17:52:44 +02:00
Benny Halevy	f439edca35	test: sstable_compaction_test: twcs_reshape_with_disjoint_set_test: take min_threshold into consideration Take into account that get_reshaping_job selects only buckets that have more than min_threashold sstables in them. Therefore, with 256 disjoint sstables in different windows, allow first or last windows to not be selected by get_reshaping_job that will return at least disjoint_sstable_count - min_threshold + 1 sstables, and not more than disjoint_sstable_count. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220123090044.38449-2-bhalevy@scylladb.com>	2022-01-23 17:52:44 +02:00
Avi Kivity	ae6fdf1599	Update seastar submodule * seastar 5025cd44ea...5524f229bb (3): > Merge "Simplify io-queue configuration" from Pavel E > fix sstring.find(): make find("") compatible with std::string > test: file_utils: test_non_existing_TMPDIR: no need to setenv Contains patch from Pavel Emelyanov <xemul@scylladb.com>: scylla-gdb: Remove _shares_capacity from fair-group debug This field is about to be removed in newer seastar, so it shouldn't be checked in scylla-gdb Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220121115643.6966-1-xemul@scylladb.com>	2022-01-21 17:38:05 +02:00
Piotr Jastrzebski	09d4438a0d	cdc: Handle compact storage correctly in preimage Base tables that use compact storage may have a special artificial column that has an empty type. `c010cefc4d` fixed the main CDC path to handle such columns correctly and to not include them in the CDC Log schema. This patch makes sure that generation of preimage ignores such empty column as well. Fixes #9876 Closes #9910 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2022-01-20 13:23:38 +01:00
Nadav Har'El	350c3d0f6a	alternator: update comment about default timeout The comment explaining where the default Alternator timeout is set became out-of-date. So fix it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220120092631.401563-1-nyh@scylladb.com>	2022-01-20 14:05:58 +02:00
Raphael S. Carvalho	5d654a6b9a	compaction: don't copy owned ranges in cleanup ctor Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220119142322.39791-1-raphaelsc@scylladb.com>	2022-01-20 14:05:58 +02:00
Botond Dénes	a65b38a9f7	reader_permit: release_base_resources(): also update _resources If the permit was admitted, _base_resources was already accounted in _resource and therefore has to be deducted from it, otherwise the permit will think it leaked some resources on destruction. Test: dtest(repair_additional_test.py.test_repair_one_missing_row_diff_shard_count) Refs: #9751 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20220119132550.532073-1-bdenes@scylladb.com>	2022-01-20 14:05:58 +02:00
Nadav Har'El	7cb6250c40	Merge 'snapshot_ctl: true_snapshots_size: fix space accounting' from Benny Halevy This pull request fixes two preexisting issues related to snapshot_ctl::true_snapshots_size https://github.com/scylladb/scylla/issues/9897 https://github.com/scylladb/scylla/issues/9898 And adds a couple unit tests to tests the snapshot_ctl functionality. Test: unit(dev), database_test.{test_snapshot_ctl_details,test_snapshot_ctl_true_snapshots_size}(debug) Closes #9899 * github.com:scylladb/scylla: table: get_snapshot_details: count allocated_size snapshot_ctl: cleanup true_snapshots_size snpashot_ctl: true_snapshots_size: do not map_reduce across all shards	2022-01-19 11:57:15 +02:00
Nadav Har'El	4aa9e86924	Merge 'alternator: move uses of replica module to data_dictionary' from Avi Kivity Alternator is a coordinator-side service and so should not access the replica module. In this series all but one of uses of the replica module are replaced with data_dictionary. One case remains - accessing the replication map which is not available (and should not be available) via the data dictionary. The data_dictionary module is expanded with missing accessors. Closes #9945 * github.com:scylladb/scylla: alternator: switch to data_dictionary for table listing purposes data_dictionary: add get_tables() data_dictionary: introduce keyspace::is_internal()	2022-01-19 11:34:25 +02:00
Avi Kivity	7399f3fae7	alternator: switch to data_dictionary for table listing purposes As a coordinator-side service, alternator shouldn't touch the replica module, so it is migrated here to data_dictionary. One use case still remains that uses replica::keyspace - accessing the replication map. This really isn't a replica-side thing, but it's also not logically part of the data dictionary, so it's left using replica::keyspace (using the data_dictionary::database::real_database() escape hatch). Figuring out how to expose the replication map to coordinator-side services is left for later.	2022-01-19 11:03:36 +02:00
Avi Kivity	f80d13c95c	data_dictionary: add get_tables() Unlike replica::database::get_column_families() which is replaces, it returns a vector of tables rather than a map. Map-like access is provided by get_table(), so it's redundant to build a new map container to expose the same functionality.	2022-01-19 09:36:22 +02:00

1 2 3 4 5 ...

29908 Commits