scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 11:10:40 +00:00

Author	SHA1	Message	Date
Konstantin Osipov	2c46938c2a	commitlog: avoid a syscall in a most common case of segment recycle When recycling a segment in O_DSYNC mode if the size of the segment is neither shrunk nor grown, avoid calling file::truncate() or file::allocate(). Message-Id: <20201215182332.1017339-2-kostja@scylladb.com>	2020-12-16 14:57:36 +02:00
Konstantin Osipov	b6c6cc275f	commitlog: align input of dma_write() during segment recycle Normally a file size should be aligned around block size, since we never write to it any unaligned size. However, we're not protected against partial writes. Just to be safe, align up the amount of bytes to zerofill when recycling a segment. Message-Id: <20201211142628.608269-4-kostja@scylladb.com>	2020-12-14 12:16:18 +02:00
Konstantin Osipov	ad6817bcde	commitlog: fix typo in a comment Message-Id: <20201211142628.608269-2-kostja@scylladb.com>	2020-12-14 12:16:14 +02:00
Pavel Emelyanov	3a025cfa52	schema-tables: Use db from make_update_table_mutations in make_update_indices_mutations Two halves of the tunnel finally connect -- the latter helper needs the local database instance and is only called by the former one which already has it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:23:53 +03:00
Pavel Emelyanov	89fd524c5a	schema-tables: Add database argument to make_update_table_mutations There are 3 callers of this helper (cdc, migration manager and tests) and all of them already have the database object at hands. The argument will be used by next patch to remove call for global storage proxy instance from make_update_indices_mutations. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:21:22 +03:00
Pavel Emelyanov	1bcef04c7a	schema-tables: Factor out calls getting database instance The make_update_indices_mutations gets database instance for two things -- to find the cf to work with and to get the value of a feature for index view creation. To suit both and to remove calls for global storage proxy and service instances get the database once in the function entrance. Next patch will clean this further. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:17:11 +03:00
Pavel Emelyanov	6dd10e771d	index-manager: Move feature evaluation one level up The create_view_for_index needs to know the state of the correct-idx-token-in-secondary-index feature. To get one it takes quite a long route through global storage service instance. Since there's only one caller of the method in question, and the method is called in a loop, it's a bit faster to get the feature value in caller and pass it in argument. This will also help to get rid of the call for global storage service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:14:12 +03:00
Nadav Har'El	781f9d9aca	alternator: make default timeout configurable Whereas in CQL the client can pass a timeout parameter to the server, in the DynamoDB API there is no such feature; The server needs to choose reasonable timeouts for its own internal operations - e.g., writes to disk, querying other replicas, etc. Until now, Alternator had a fixed timeout of 10 seconds for its requests. This choice was reasonable - it is much higher than we expect during normal operations, and still lower than the client-side timeouts that some DynamoDB libraries have (boto3 has a one-minute timeout). However, there's nothing holy about this number of 10 seconds, some installations might want to change this default. So this patch adds a configuration option, "--alternator-timeout-in-ms", to choose this timeout. As before, it defaults to 10 seconds (10,000ms). In particular, some test runs are unusually slow - consider for example testing a debug build (which is already very slow) in an extremely over-comitted test host. In some cases (see issue #7706) we noticed the 10 second timeout was not enough. So in this patch we increase the default timeout chosen in the "test/alternator/run" script to 30 seconds. Please note that as the code is structured today, this timeout only applies to some operations, such as GetItem, UpdateItem or Scan, but does not apply to CreateTable, for example. This is a pre-existing issue that this patch does not change. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201207122758.2570332-1-nyh@scylladb.com>	2020-12-09 14:30:43 +01:00
Avi Kivity	f802356572	Revert "Revert "Merge "raft: fix replication if existing log on leader" from Gleb"" This reverts commit `dc77d128e9`. It was reverted due to a strange and unexplained diff, which is now explained. The HEAD on the working directory being pulled from was set back, so git thought it was merging the intended commits, plus all the work that was committed from HEAD to master. So it is safe to restore it.	2020-12-08 19:19:55 +02:00
Piotr Sarna	1cc4ed50c1	db: fix getting local ranges for size estimates table When getting local ranges, an assumption is made that if a range does not contain an end or when its end is a maximum token, then it must contain a start. This assumption proven not true during manual tests, so it's now fortified with an additional check. Here's a gdb output for a set of local ranges which causes an assertion failure when calling `get_local_ranges` on it: (gdb) p ranges $1 = std::vector of length 2, capacity 2 = {{_interval = {_start = std::optional<interval_bound<dht::token>> = {[contained value] = {_value = {_kind = dht::token_kind::before_all_keys, _data = 0}, _inclusive = false}}, _end = std::optional<interval_bound<dht::token>> [no contained value], _singular = false}}, {_interval = { _start = std::optional<interval_bound<dht::token>> [no contained value], _end = std::optional<interval_bound<dht::token>> = {[contained value] = {_value = { _kind = dht::token_kind::before_all_keys, _data = 0}, _inclusive = true}}, _singular = false}}} Closes #7764	2020-12-07 12:08:31 +02:00
Benny Halevy	64a4ffc579	large_data_handler: do not delete records in the absence of large_data_stats The previous way of deleting records based on the whole sstatble data_size causes overzealous deletions (#7668) and inefficiency in the rows cache due to the large number of range tombstones created. Therefore we'd be better of by juts letting the records expire using he 30 days TTL. Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201206083725.1386249-1-bhalevy@scylladb.com>	2020-12-06 11:34:37 +02:00
Avi Kivity	dc77d128e9	Revert "Merge "raft: fix replication if existing log on leader" from Gleb" This reverts commit `0aa1f7c70a`, reversing changes made to `72c59e8000`. The diff is strange, including unrelated commits. There is no understanding of the cause, so to be safe, revert and try again.	2020-12-06 11:34:19 +02:00
Benny Halevy	4406a2514e	large_data_handler: maybe_delete_large_data_entries: use sstable large data stats If the sstable has scylla_metadata::large_data_stats use them to determine whether to delete the corresponding large data records. Otherwise, defer to the current method of comparing the sstable data_size to the respective thresholds. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:19:42 +02:00
Benny Halevy	8cebe7776f	large_data_handler: maybe_delete_large_data_entries: accept shared_sstable Since the actual deletion if the large data entries is done in the background, and we don't captures the shared_sstable, we can safely pass it to maybe_delete_large_data_entries when deleting the sstable in sstable::unlink and it will be release as soon as maybe_delete_large_data_entries returns. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:19:42 +02:00
Benny Halevy	f7d0ae3d10	large_data_handler: maybe_delete_large_data_entries: move out of line It is called on the cold path, when the sstable is deleted. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:19:42 +02:00
Benny Halevy	8ab053bd44	large_data_handler: expose methods to get threshold To be used for keeping large_data statistics in sstable. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:18:14 +02:00
Benny Halevy	dd7422a713	large_data_handler: indicate recording of large data entries Return true from the maybe_{record,log}_* methods if a large data record or log entry were emitted. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:18:14 +02:00
Benny Halevy	873107821b	large_data_handler: move constructor out of line No need for it to be inlined. Also, add debug logging to the large data handler options. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-01 15:18:14 +02:00
Avi Kivity	e8ff77c05f	Merge 'sstables: a bunch of refactors' from Kamil Braun 1. sstables: move `sstable_set` implementations to a separate module All the implementations were kept in sstables/compaction_strategy.cc which is quite large even without them. `sstable_set` already had its own header file, now it gets its own implementation file. The declarations of implementation classes and interfaces (`sstable_set_impl`, `bag_sstable_set`, and so on) were also exposed in a header file, sstable_set_impl.hh, for the purposes of potential unit testing. 2. mutation_reader: move `mutation_reader::forwarding` to flat_mutation_reader.hh Files which need this definition won't have to include mutation_reader.hh, only flat_mutation_reader.hh (so the inclusions are in total smaller; mutation_reader.hh includes flat_mutation_reader.hh). 3. sstables: move sstable reader creation functions to `sstable_set` Lower level functions such as `create_single_key_sstable_reader` were made methods of `sstable_set`. The motivation is that each concrete sstable_set may decide to use a better sstable reading algorithm specific to the data structures used by this sstable_set. For this it needs to access the set's internals. A nice side effect is that we moved some code out of table.cc and database.hh which are huge files. 4. sstables: pass `ring_position` to `create_single_key_sstable_reader` instead of `partition_range`. It would be best to pass `partition_key` or `decorated_key` here. However, the implementation of this function needs a `partition_range` to pass into `sstable_set::select`, and `partition_range` must be constructed from `ring_position`s. We could create the `ring_position` internally from the key but that would involve a copy which we want to avoid. 5. sstable_set: refactor `filter_sstable_for_reader_by_pk` Introduce a `make_pk_filter` function, which given a ring position, returns a boolean function (a filter) that given a sstable, tells whether the sstable may contain rows with the given position. The logic has been extracted from `filter_sstable_for_reader_by_pk`. Split from #7437. Closes #7655 * github.com:scylladb/scylla: sstable_set: refactor filter_sstable_for_reader_by_pk sstables: pass ring_position to create_single_key_sstable_reader sstables: move sstable reader creation functions to `sstable_set` mutation_reader: move mutation_reader::forwarding to flat_mutation_reader.hh sstables: move sstable_set implementations to a separate module	2020-11-24 09:23:57 +02:00
Pavel Emelyanov	fea4a5492f	system-keyspace: Remove dead code Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20201123151453.27341-1-xemul@scylladb.com>	2020-11-23 17:16:15 +02:00
Avi Kivity	1e170ebfc1	Merge 'Changing hints configuration followup' from Piotr Dulikowski Follow-up to https://github.com/scylladb/scylla/pull/6916. - Fixes wrong usage of `resource_manager::prepare_per_device_limits`, - Improves locking in `resource_manager` so that it is more safe to call its methods concurrently, - Adds comments around `resource_manager::register_manager` so that it's more clear what this method does and why. Closes #7660 * github.com:scylladb/scylla: hints/resource_manager: add comments to register_manager hints/resource_manager: fix indentation hints/resource_manager: improve mutual exclusion hints/resource_manager: correct prepare_per_device_limits usage	2020-11-22 15:06:35 +02:00
Piotr Sarna	5a9dc6a3cc	Merge 'Cleanup CDC tests after CDC became GA' from Piotr Jastrzębski Now that CDC is GA, it should be enabled in all the tests by default. To achieve that the PR adds a special db::config::add_cdc_extension() helper which is used in cql_test_envm to make sure CDC is usable in all the tests that use cql_test_env.m As a result, cdc_tests can be simplified. Finally, some trailing whitespaces are removed from cdc_tests. Tests: unit(dev) Closes #7657 * github.com:scylladb/scylla: cdc: Remove trailing whitespaces from cdc_tests cdc: Remove mk_cdc_test_config from tests config: Add add_cdc_extension function for testing cdc: Add missing includes to cdc_extension.hh	2020-11-20 13:56:29 +01:00
Kamil Braun	40d8bfa394	sstables: move sstable reader creation functions to `sstable_set` Lower level functions such as `create_single_key_sstable_reader` were made methods of `sstable_set`. The motivation is that each concrete sstable_set may decide to use a better sstable reading algorithm specific to the data structures used by this sstable_set. For this it needs to access the set's internals. A nice side effect is that we moved some code out of table.cc and database.hh which are huge files.	2020-11-19 17:52:39 +01:00
Avi Kivity	70689088fd	Merge "Remove reference on database from global qctx" from Pavel E " The qctx is global object that references query processor and database to let the rest of the code query system keyspace. As the first step of de-globalizing it -- remove the database reference from it. After the set the qctx remains a simple wrapper over the query processor (which is already de-globalized) and the query processor in turn is mostly needed only to parse the query string into prepared statement only. This, in turn, makes it possible to remove the qctx later by parsing the query strings on boot and carrying _them_ around, not the qctx itself. tests: unit(dev), dtest(simple_cluster_driver_test:dev), manual start/stop " * 'br-remove-database-from-qctx' of https://github.com/xemul/scylla: query-context: Remove database from qctx schema-tables: Use query processor referece in save_system(_keyspace)?_schema system-keyspace: Rewrite force_blocking_flush system-keyspace: Use cluster_name string in check_health system-keyspace: Use db::config in setup_version query-context: Kill global helpers test: Use cql_test_env::evecute_cql instead of qctx version code: Use qctx::evecute_cql methods, not global ones system-keyspace: Do not call minimal_setup for the 2nd time system-keyspace: Fix indentation after previous patch system-keyspace: Do not do invoke_on_all by hands system-keyspace: Remove dead code	2020-11-19 18:31:51 +02:00
Pavel Emelyanov	689fd029a1	query-context: Remove database from qctx No users of qctx::db are left. One global database reference less. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	464c8990d4	schema-tables: Use query processor referece in save_system(_keyspace)?_schema The save_system_schema and save_system_keyspace_schema are both called on start and can the needed get query processor reference from arguments. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	66dcc47571	system-keyspace: Rewrite force_blocking_flush The method is called after query_processor::execute_internal to flush the cf. Encapsulating this flush inside database and getting the database from query_processor lets removing database reference from global qctx object. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	6cad18ad33	system-keyspace: Use cluster_name string in check_health The check_help needs global qctx to get db.config.cluster_name, which is already available at the caller side. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	36a3ee6ad4	system-keyspace: Use db::config in setup_version This is the beginning of de-globalizing global qctx thing. The setup_version() needs global qctx to get config from. It's possible to get the config from the caller instead. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	43039a0812	query-context: Kill global helpers Now the db::execute_cql* callers are patched, the global helpers can be removed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	303ebe4a36	code: Use qctx::evecute_cql methods, not global ones There are global db::execute_cql() helpers that just forward the args into qctx::execute_cql(). The former are going away, so patch all callers to use qctx themselves. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	8bf6b1298c	system-keyspace: Do not call minimal_setup for the 2nd time THe system_keyspace::minimal_setup is called by main.cc by hands already, some steps before the regular ::setup(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	7b82ec2f9e	system-keyspace: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	1773dadc72	system-keyspace: Do not do invoke_on_all by hands The cache_truncation_record needs to run cf.cache_truncation_record on each shard's DB, so the invoke_on_all can be used. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Pavel Emelyanov	fb20d9cd1e	system-keyspace: Remove dead code Not called anywhare. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Piotr Dulikowski	60ac68b7a2	hints/resource_manager: add comments to register_manager Adds more comments to resource_manager::register_manager in order to better explain what this function is doing.	2020-11-19 16:34:37 +01:00
Piotr Dulikowski	c0c10b918c	hints/resource_manager: fix indentation Fixes indentation in prepare_per_device_limits.	2020-11-19 16:34:37 +01:00
Piotr Dulikowski	ead6a3f036	hints/resource_manager: improve mutual exclusion This commit causes start, stop and register_manager methods of the resource_manager to be serialized with respect to each other using the _operation_lock. Those function modify internal state, so it's best if they are protected with a semaphore. Additionally, those function are not going to be used frequently, therefore it's perfectly fine to protect them in such a coarse manner. Now, space_watchdog has a dedicated lock for serializing its on_timer logic with resource_manager::register_manager. The reason for separate lock is that resource_manager::stop cannot use the same lock as the space_watchdog - otherwise a situation could occur in which space_watchdog waits for semaphore units held by resource_manager::stop(), and resource_manager::stop() waits until the space_watchdog stops its asynchronous event loop.	2020-11-19 16:34:37 +01:00
Piotr Dulikowski	362aebee7b	hints/resource_manager: correct prepare_per_device_limits usage The resource_manager::prepare_per_device_limits function calculates disk quota for registered hints managers, and creates an association map: from a storage device id to those hints manager which store hints on that device (_per_device_limits_map) This function was used with an assumption that it is idempotent - which is a wrong assumption. In resource_manager::register_manager, if the resource_manager is already started, prepare_per_device_limits would be called, and those hints managers which were previously added to the _per_device_limits_map would be added again. This would cause the space used by those managers to be calculated twice, which would artificially lower the limit which we impose on the space hints are allowed to occupy on disk. This patch fixes this problem by changing the prepare_per_device_limits function to operate on a hints manager passed by argument. Now, we make sure that this function is called on each hints manager only once.	2020-11-19 16:34:37 +01:00
Piotr Jastrzebski	9ede193f0a	config: Add add_cdc_extension function for testing and use it in cql_test_env to enable cdc extension for all tests that use it. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-19 16:16:07 +01:00
Piotr Sarna	c0d72b4491	db,view: remove duplicate entries from the list of target endpoints If a list of target endpoints for sending view updates contains duplicates, it results in benign (but annoying) broken promise errors happening due to duplicated write response handlers being instantiated for a single endpoint. In order to avoid such errors, target remote endpoints are deduplicated from the list of pending endpoints. A similar issue (#5459) solved the case for duplicated local endpoints, but that didn't solve the general case. Fixes #7572 Closes #7641	2020-11-18 13:43:49 +02:00
Avi Kivity	d612ca78f3	Merge 'Allow changing hinted handoff configuration in runtime' from Piotr Dulikowski This PR allows changing the hinted_handoff_enabled option in runtime, either by modifying and reloading YAML configuration, or through HTTP API. This PR also introduces an important change in semantics of hinted_handoff_enabled: - Previously, hinted_handoff_enabled controlled whether _both writing and sending_ hints is allowed at all, or to particular DCs, - Now, hinted_handoff_enabled only controls whether _writing hints_ is enabled. Sending hints from disk is now always enabled. Fixes: #5634 Tests: - unit(dev) for each commit of the PR - unit(debug) for the last commit of the PR Closes #6916 * github.com:scylladb/scylla: api: allow changing hinted handoff configuration storage_proxy: fix wrong return type in swagger hints_manager: implement change_host_filter storage_proxy: always create hints manager config: plug in hints::host_filter object into configuration db/hints: introduce host_filter hints/resource_manager: allow registering managers after start hints: introduce db::hints::directory_initializer directories.cc: prepare for use outside main.cc	2020-11-18 13:41:02 +02:00
Piotr Jastrzebski	c0bc6b5795	size_estimates_virtual_reader: Remove std::iterator std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Piotr Dulikowski	0fd36e2579	api: allow changing hinted handoff configuration This commit makes it possible to change hints manager's configuration at runtime through HTTP API. To preserve backwards compatibility, we keep the old behavior of not creating and checking hints directories if they are not enabled at startup. Instead, hint directories are lazily initialized when hints are enabled for the first time through HTTP API.	2020-11-17 10:24:43 +01:00
Piotr Dulikowski	220a2ca800	hints_manager: implement change_host_filter Implements a function which is responsible for changing hints manager configuration while it is running. It first starts new endpoint managers for endpoints which weren't allowed by previous filter but are now, and then stops endpoint managers which are rejected by the new filter. The function is blocking and waits until all relevant ep managers are started or stopped.	2020-11-17 10:24:43 +01:00
Piotr Dulikowski	1302f1b5bf	storage_proxy: always create hints manager Now, the hints manager object for regular hints is always created, even if hints are disabled in configuration. Please note that the behavior of hints will be unchanged - no hints will be sent when they are disabled. The intent of this change is to make enabling and disabling hints in runtime easier to implement.	2020-11-17 10:24:43 +01:00
Piotr Dulikowski	cefe5214ff	config: plug in hints::host_filter object into configuration Uses db::hints::host_filter as the type of hinted_handoff_enabled configuration option. Previously, hinted_handoff_enabled used to be a string option, and it was parsed later in a separate function during startup. The function returned a std::optional<std::unordered_set<sstring>>, whose meaning in the context of hints is rather enigmatic for an observer not familiar with hints. Now, hinted_handoff_enabled has type of db::hints::host_filter, and it is plugged into the config parsing framework, so there is no need for later post-processing.	2020-11-17 10:24:42 +01:00
Piotr Dulikowski	5c3c7c946b	db/hints: introduce host_filter Adds a db::hints::host_filter structure, which determines if generating hints towards a given target is currently allowed. It supports serialization and deserialization between the hinted_handoff_enabled configuration/cli option. This patch only introduces this structure, but does not make other code use it. It will be plugged into the configuration architecture in the following commits.	2020-11-17 10:15:47 +01:00
Piotr Dulikowski	a4f03d72b3	hints/resource_manager: allow registering managers after start This change modifies db::hints::resource_manager so that it is now possible to add hints::managers after it was started. This change will make it possible to register the regular hints manager later in runtime, if it wasn't enabled at boot time.	2020-11-17 10:15:47 +01:00
Piotr Dulikowski	40710677d0	hints: introduce db::hints::directory_initializer Introduces a db::hints::directory_initializer object, which encapsulates the logic of initializing directories for hints (creating/validating directories, segment rebalancing). It will be useful for lazy initialization of hints manager.	2020-11-17 10:15:47 +01:00

1 2 3 4 5 ...

1915 Commits