scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-07 07:23:15 +00:00

Author	SHA1	Message	Date
Avi Kivity	0f967f911d	Merge "storage_service: get_token_metadata_ptr to hold on to token_metadata" from Benny " This series fixes use-after-free via token_metadata& We may currently get a token_metadata& via get_token_metadata() and use it across yield points in a couple of sites: - do_decommission_removenode_with_repair - get_new_source_ranges To fix that, get_token_metadata_ptr and hold on to it across yielding. Fixes #7790 Dtest: update_cluster_layout_tests:TestUpdateClusterLayout.simple_removenode_2_test(debug) Test: unit(dev) " * tag 'storage_service-token_metadata_ptr-v2' of github.com:bhalevy/scylla: storage_service: get_new_source_ranges: don't hold token_metadata& across yield point storage_service: get_changed_ranges_for_leaving: no need to maybe_yield for each token_range storage_service: get_changed_ranges_for_leaving: release token_metadata_ptr sooner storage_service: get_changed_ranges_for_leaving: don't hold token_metadata& across yield	2020-12-13 17:37:24 +02:00
Aleksandr Bykov	e74dc311e7	dist: scylla_util: fix aws_instance.ebs_disks method aws_instance.ebs_disks() method should return ebs disk instead of ephemeral Signed-off-by: Aleksandr Bykov <alex.bykov@scylladb.com> Closes #7780	2020-12-13 17:33:37 +02:00
Benny Halevy	1fbc831dae	storage_service: get_new_source_ranges: don't hold token_metadata& across yield point Provide the token_metadata& to get_new_source_ranges by the caller, who keeps it valid throughout the call. Note that there is no need to clone_only_token_map since the token_metadata_ptr is immutable and can be used just as well for calling strat.get_range_addresses. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-13 16:42:00 +02:00
Benny Halevy	f13913d251	storage_service: get_changed_ranges_for_leaving: no need to maybe_yield for each token_range Now that we pass can_yield::yes to calculate_natural_endpoints for each token_range. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-13 16:42:00 +02:00
Benny Halevy	89ed0705e8	storage_service: get_changed_ranges_for_leaving: release token_metadata_ptr sooner No need to hold on to the shared token_metadata_ptr after we got clone_after_all_left(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-13 16:42:00 +02:00
Benny Halevy	684c4143df	storage_service: get_changed_ranges_for_leaving: don't hold token_metadata& across yield When yielding in clone_only_token_map or clone_after_all_left the token_metadata got with get_token_metadata() may go away. Use get_token_metadata_ptr() instead to hold on to it. And with that, we don't need to clone_only_token_map. `metadata` is not modified by calculate_natural_endpoints, so we can just refer to the immutable copy retrieved with get_token_metadata_ptr. Fixes #7790 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-13 16:41:58 +02:00
Avi Kivity	65a0244614	Update tools/jmx submodule * tools/jmx 6174a47...20469bf (1): > column_family: Return proper cardinality for toppartitions requests	2020-12-13 13:51:38 +02:00
Avi Kivity	9265b87610	Merge "Remove get_local_storage_proxy from validation" from Pavel E " The validate_column_family() helper uses the global proxy reference to get database from. Fortunatelly, all the callers of it can provide one via argument. tests: unit(dev) " * 'br-no-proxy-in-validate' of https://github.com/xemul/scylla: validation: Remove get_local_storage_proxy call client_state: Call validate_column_family() with database arg client_state: Add database& arg to has_column_family_access storage_proxy: Add .local_db() getters validate: Mark database argument const	2020-12-13 13:12:57 +02:00
Avi Kivity	19aaf8eb83	Merge "Remove global storage service from index manager" from Pavel E " The initial intent was to remove call for global storage service from secondary index manager's create_view_for_index(), but while fixing it one of intermediate schema table's helper managed to benefit from it by re-using the database reference flying by. The cleanup is done by simply pushing the database reference along the stack from the code that already has it down the create_view_for_index(). tests: unit(dev) " * 'br-no-storages-in-index-and-schema' of https://github.com/xemul/scylla: schema-tables: Use db from make_update_table_mutations in make_update_indices_mutations schema-tables: Add database argument to make_update_table_mutations schema-tables: Factor out calls getting database instance index-manager: Move feature evaluation one level up	2020-12-13 12:41:51 +02:00
Benny Halevy	aae3991246	repair: do_decommission_removenode_with_repair: don't deref ops when null `ops` might be passed as a disengaged shared_ptr when called from `decommission_with_repair`. In this case we need to propagate to sync_data_using_repair a disengaged std::optional<utils::UUID>. Fixes #7788 DTest: update_cluster_layout_tests:TestUpdateClusterLayout.verify_latest_copy_decommission_node_test(debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201213073743.331253-1-bhalevy@scylladb.com>	2020-12-13 12:37:18 +02:00
Avi Kivity	18be57a4e5	Update seastar submodule * seastar 8b400c7b45...2de43eb6bf (3): > core: show span free sizes correctly in diagnostics > Merge "IO queues to share capacities" from Pavel E > file: make_file_impl: determine blockdev using st_mode	2020-12-12 21:57:01 +02:00
Pekka Enberg	c990f2bd34	Merge 'Reinstate [[nodiscard]] support' from Avi Kivity The switch to clang disabled the clang-specific -Wunused-value since it generated some harmless warnings. Unfortunately, that also prevent [[nodiscard]] violations from warning. Fix by clearing all instances of the warning (including [[nodiscard]] violations that crept in while it was disabled) and reinstating the warning. Closes #7767 * github.com:scylladb/scylla: build: reinstate -Wunused-value warning for [[nodiscard]] test: lib: don't ignore future in compare_readers() test: mutation_test: check both ranges when comparing summaries serialializer: silence unused value warning in variant deserializer	2020-12-12 09:54:05 +02:00
Avi Kivity	615b8e8184	dist: rpm: uninstall tuned when installing scylla-kernel-conf tuned 2.11.0-9 and later writes to kerned.sched_wakeup_granularity_ns and other sysctl tunables that we so laboriously tuned, dropping performance by a factor of 5 (due to increased latency). Fix by obsoleting tuned during install (in effect, we are a better tuned, at least for us). Not needed for .deb, since debian/ubunto do not install tuned by default. Fixes #7696 Closes #7776	2020-12-12 09:54:05 +02:00
Pavel Emelyanov	3a025cfa52	schema-tables: Use db from make_update_table_mutations in make_update_indices_mutations Two halves of the tunnel finally connect -- the latter helper needs the local database instance and is only called by the former one which already has it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:23:53 +03:00
Pavel Emelyanov	89fd524c5a	schema-tables: Add database argument to make_update_table_mutations There are 3 callers of this helper (cdc, migration manager and tests) and all of them already have the database object at hands. The argument will be used by next patch to remove call for global storage proxy instance from make_update_indices_mutations. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:21:22 +03:00
Pavel Emelyanov	1bcef04c7a	schema-tables: Factor out calls getting database instance The make_update_indices_mutations gets database instance for two things -- to find the cf to work with and to get the value of a feature for index view creation. To suit both and to remove calls for global storage proxy and service instances get the database once in the function entrance. Next patch will clean this further. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:17:11 +03:00
Pavel Emelyanov	6dd10e771d	index-manager: Move feature evaluation one level up The create_view_for_index needs to know the state of the correct-idx-token-in-secondary-index feature. To get one it takes quite a long route through global storage service instance. Since there's only one caller of the method in question, and the method is called in a loop, it's a bit faster to get the feature value in caller and pass it in argument. This will also help to get rid of the call for global storage service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:14:12 +03:00
Pavel Emelyanov	83073f4e8b	validation: Remove get_local_storage_proxy call It is used in validate_column_family. The last caller of it was removed by previous patch, so we may kill the helper itself Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 18:52:42 +03:00
Pavel Emelyanov	12cc539835	client_state: Call validate_column_family() with database arg The previous patch brought the databse reference arg. And since the currently called validate_column_family() overload _just_ gets the database from global proxy, it's better to shortcut. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 18:50:49 +03:00
Pavel Emelyanov	b0c4a9087d	client_state: Add database& arg to has_column_family_access It is called from cql3/statements' check_access methods and from thrift handlers. The former have proxy argument from which they can get the database. The latter already have the database itself on board. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 18:49:16 +03:00
Pavel Emelyanov	4c7bc8a3d1	storage_proxy: Add .local_db() getters To facilitate the next patching Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 18:48:02 +03:00
Avi Kivity	a11ecfe231	Merge 'types: don't linearize in validate()' from Michał Chojnowski A sequel to #7692. This series gets rid of linearization when validating collections and tuple types. (Other types were already validated without linearizing). The necessary helpers for reading from fragmented buffers were introduced in #7692. All this series does is put them to use in `validate()`. Refs: #6138 Closes #7770 * github.com:scylladb/scylla: types: add single-fragment optimization in validate() utils: fragment_range: add with_simplified() cql3: statements: select_statement: remove unnecessary use of with_linearized cql3: maps: remove unnecessary use of with_linearized cql3: lists: remove unnecessary use of with_linearized cql3: tuples: remove unnecessary use of with_linearized cql3: sets: remove unnecessary use of with_linearized cql3: tuples: remove unnecessary use of with_linearized cql3: attributes: remove unnecessary uses of with_linearized types: validate lists without linearizing types: validate tuples without linearizing types: validate sets without linearizing types: validate maps without linearizing types: template abstract_type::validate on FragmentedView types: validate_visitor: transition from FragmentRange to FragmentedView utils: fragmented_temporary_buffer: add empty() to FragmentedView utils: fragmented_temporary_buffer: don't add to null pointer	2020-12-11 17:33:59 +02:00
Pavel Emelyanov	563b466227	validate: Mark database argument const They are indeed used like that Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 18:27:45 +03:00
Michał Chojnowski	150473f074	types: add single-fragment optimization in validate() Manipulating fragmented views is costlier that manipulating contiguous views, so let's detect the common situation when the fragmented view is actually contiguous underneath, and make use of that. Note: this optimization is only useful for big types. For trivial types, validation usually only checks the size of the view.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	e2d17879fc	utils: fragment_range: add with_simplified() Reading from contiguous memory (bytes_view) is significantly simpler runtime-wise than reading from a fragmented view, due to less state and less branching, so we often want to convert a fragmented view to a simple view before processing it, if the fragmented view contains at most one fragment, which is common. with_simplified() does just that.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	51ca5fa4c5	cql3: statements: select_statement: remove unnecessary use of with_linearized We can validate directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	72186bee69	cql3: maps: remove unnecessary use of with_linearized We can validate directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	3f3a10c588	cql3: lists: remove unnecessary use of with_linearized We can validate directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	efa036329d	cql3: tuples: remove unnecessary use of with_linearized We can validate directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	4f359a7a99	cql3: sets: remove unnecessary use of with_linearized We can validate directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	281417917b	cql3: tuples: remove unnecessary use of with_linearized We can validate directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	d1d1a00311	cql3: attributes: remove unnecessary uses of with_linearized We can validate and deserialize directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	0581b3ff31	types: validate lists without linearizing We can validate collections directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	4fe41b69fd	types: validate tuples without linearizing We can validate tuples directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	a7dd736d03	types: validate sets without linearizing We can validate collections directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	1459608375	types: validate maps without linearizing We can validate collections directly from fragmented buffers now.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	82befbe8c0	types: template abstract_type::validate on FragmentedView This is primarily a stylistic change. It makes the interface more consistent with deserialize(). It will also allow us to call `validate()` for collection elements in `validate_aux()`.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	15dbe00e8a	types: validate_visitor: transition from FragmentRange to FragmentedView This will allow us to easily get rid of linearizations when validating collections and tuples, because the helpers used in validate_aux() already have FragmentedView overloads.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	3647c0ba47	utils: fragmented_temporary_buffer: add empty() to FragmentedView It's redundant with size_bytes(), but sometimes empty() is more readable and reduces churn when replacing other types with FragmentedView.	2020-12-11 09:53:07 +01:00
Michał Chojnowski	b4dd5d3bdb	utils: fragmented_temporary_buffer: don't add to null pointer When fragmented_temporary_buffer::view is created from a bytes_view, _current is null. In that case, in remove_current(), null pointer offset happens, and ubsan complains. Fix that.	2020-12-11 09:53:07 +01:00
Raphael S. Carvalho	e4b55f40f3	sstables: Fix sstable reshaping for STCS The heuristic of STCS reshape is correct, and it built the compaction descriptor correctly, but forgot to return it to the caller, so no reshape was ever done on behalf of STCS even when the strategy needed it. Fixes #7774. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20201209175044.1609102-1-raphaelsc@scylladb.com>	2020-12-10 12:45:25 +02:00
Asias He	829b4c1438	repair: Make removenode safe by default Currently removenode works like below: - The coordinator node advertises the node to be removed in REMOVING_TOKEN status in gossip - Existing nodes learn the node in REMOVING_TOKEN status - Existing nodes sync data for the range it owns - Existing nodes send notification to the coordinator - The coordinator node waits for notification and announce the node in REMOVED_TOKEN Current problems: - Existing nodes do not tell the coordinator if the data sync is ok or failed. - The coordinator can not abort the removenode operation in case of error - Failed removenode operation will make the node to be removed in REMOVING_TOKEN forever. - The removenode runs in best effort mode which may cause data consistency issues. It means if a node that owns the range after the removenode operation is down during the operation, the removenode node operation will continue to succeed without requiring that node to perform data syncing. This can cause data consistency issues. For example, Five nodes in the cluster, RF = 3, for a range, n1, n2, n3 is the old replicas, n2 is being removed, after the removenode operation, the new replicas are n1, n5, n3. If n3 is down during the removenode operation, only n1 will be used to sync data with the new owner n5. This will break QUORUM read consistency if n1 happens to miss some writes. Improvements in this patch: - This patch makes the removenode safe by default. We require all nodes in the cluster to participate in the removenode operation and sync data if needed. We fail the removenode operation if any of them is down or fails. If the user want the removenode operation to succeed even if some of the nodes are not available, the user has to explicitly pass a list of nodes that can be skipped for the operation. $ nodetool removenode --ignore-dead-nodes <list_of_dead_nodes_to_ignore> <host_id> Example restful api: $ curl -X POST "http://127.0.0.1:10000/storage_service/remove_node/?host_id=7bd303e9-4c7b-4915-84f6-343d0dbd9a49&ignore_nodes=127.0.0.3,127.0.0.5" - The coordinator can abort data sync on existing nodes For example, if one of the nodes fails to sync data. It makes no sense for other nodes to continue to sync data because the whole operation will fail anyway. - The coordinator can decide which nodes to ignore and pass the decision to other nodes Previously, there is no way for the coordinator to tell existing nodes to run in strict mode or best effort mode. Users will have to modify config file or run a restful api cmd on all the nodes to select strict or best effort mode. With this patch, the cluster wide configuration is eliminated. Fixes #7359 Closes #7626	2020-12-10 10:14:39 +02:00
Piotr Sarna	20bdeb315a	Merge ' types: add constraint on lexicographical_tri_compare()' from Avi Kivity Verify that the input types are iterators and their value types are compatible with the compare function. Because some of the inputs were not actually valid iterators, they are adjusted too. Closes #7631 * github.com:scylladb/scylla: types: add constraint on lexicographical_tri_compare() composite: make composite::iterator a real input_iterator compound: make compount_type::iterator a real input_iterator	2020-12-09 18:48:01 +01:00
Nadav Har'El	a8fdbf31cd	alternator: fix UpdateItem ADD for non-existent attribute UpdateItem's "ADD" operation usually adds elements to an existing set or adds a number to an existing counter. But it can also be used to create a new set or counter (as if adding to an empty set or zero). We unfortunately did not have a test for this case (creating a new set or counter), and when I wrote such a test now, I discovered the implementation was missing. So this patch adds both the test and the implementation. The new test used to fail before this patch, and passes with it - and passes on DynamoDB. Note that we only had this bug for the newer UpdateItem syntax. For the old AttributeUpdates syntax, we already support ADD actions on missing attributes, and already tested it in test_update_item_add(). I just forgot to test the same thing for the newer syntax, so I missed this bug :-( Fixes #7763. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201207085135.2551845-1-nyh@scylladb.com>	2020-12-09 18:44:30 +01:00
Juliusz Stasiewicz	b150906d39	gossip: Added SNITCH_NAME to `application_state` Snitch name needs to be exchanged within cluster once, on shadow round, so joining nodes cannot use wrong snitch. The snitch names are compared on bootstrap and on normal node start. If the cluster already used mixed snitches, the upgrade to this version will fail. In this case customer needs to add a node with correct snitch for every node with the wrong snitch, then put down the nodes with the wrong snitch and only then do the upgrade. Fixes #6832 Closes #7739	2020-12-09 15:45:25 +02:00
Nadav Har'El	781f9d9aca	alternator: make default timeout configurable Whereas in CQL the client can pass a timeout parameter to the server, in the DynamoDB API there is no such feature; The server needs to choose reasonable timeouts for its own internal operations - e.g., writes to disk, querying other replicas, etc. Until now, Alternator had a fixed timeout of 10 seconds for its requests. This choice was reasonable - it is much higher than we expect during normal operations, and still lower than the client-side timeouts that some DynamoDB libraries have (boto3 has a one-minute timeout). However, there's nothing holy about this number of 10 seconds, some installations might want to change this default. So this patch adds a configuration option, "--alternator-timeout-in-ms", to choose this timeout. As before, it defaults to 10 seconds (10,000ms). In particular, some test runs are unusually slow - consider for example testing a debug build (which is already very slow) in an extremely over-comitted test host. In some cases (see issue #7706) we noticed the 10 second timeout was not enough. So in this patch we increase the default timeout chosen in the "test/alternator/run" script to 30 seconds. Please note that as the code is structured today, this timeout only applies to some operations, such as GetItem, UpdateItem or Scan, but does not apply to CreateTable, for example. This is a pre-existing issue that this patch does not change. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201207122758.2570332-1-nyh@scylladb.com>	2020-12-09 14:30:43 +01:00
Avi Kivity	f802356572	Revert "Revert "Merge "raft: fix replication if existing log on leader" from Gleb"" This reverts commit `dc77d128e9`. It was reverted due to a strange and unexplained diff, which is now explained. The HEAD on the working directory being pulled from was set back, so git thought it was merging the intended commits, plus all the work that was committed from HEAD to master. So it is safe to restore it.	2020-12-08 19:19:55 +02:00
Avi Kivity	1badd315ef	Merge "Speed up devel tests 10 times" from Pavel E " The multishard_mutation_query test is toooo slow when built with clang in dev mode. By reducing the number of scans it's possible to shrink the full suite run time from half an hour down to ~3 minutes. tests: unit(dev) " * 'br-devel-mode-tests' of https://github.com/xemul/scylla: test: Make multishard_mutation_query test do less scans configure: Add -DDEVEL to dev build flags	2020-12-08 15:42:12 +02:00
Pavel Emelyanov	b837cf25b1	test: Make multishard_mutation_query test do less scans When built by clang this dev-mode test takes ~30 minutes to complete. Let's reduce this time by reducing the scale of the test if DEVEL is set. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-08 15:55:04 +03:00
Pavel Emelyanov	703451311f	configure: Add -DDEVEL to dev build flags To let source code tell debug, dev and release builds from each other. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-08 15:54:30 +03:00

1 2 3 4 5 ...

24611 Commits