scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-05 06:23:03 +00:00

Author	SHA1	Message	Date
Avi Kivity	6ffd813b7b	Merge 'hints: delay repair until hints are replayed' from Piotr Dulikowski Both hinted handoff and repair are meant to improve the consistency of the cluster's data. HH does this by storing records of failed replica writes and replaying them later, while repair goes through all data on all participaring replicas and makes sure the same data is stored on all nodes. The former is generally cheaper and sometimes (but not always) can bring back full consistency on its own; repair, while being more costly, is a sure way to bring back current data to full consistency. When hinted handoff and repair are running at the same time, some of the work can be unnecessarily duplicated. For example, if a row is repaired first, then hints towards it become unnecessary. However, repair needs to do less work if data already has good consistency, so if hints finish first, then the repair will be shorter. This PR introduces a possibility to wait for hints to be replayed before continuing with user-issued repair. The coordinator of the repair operation asks all nodes participating in the repair operation (including itself) to mark a point at the end of all hint queues pointing towards other nodes participating in repair. Then, it waits until hint replay in all those queues reaches marked point, or configured timeout is reached. This operation is currently opt-in and can be turned on by setting the `wait_for_hint_replay_before_repair_in_ms` config option to a positive value. Fixes #8102 Tests: - unit(dev) - some manual tests: - shutting down repair coordinator during hints replay, - shutting down node participating in repair during hints replay, Closes #8452 * github.com:scylladb/scylla: repair: introduce abort_source for repair abort repair: introduce abort_source for shutdown storage_proxy: add abort_source to wait_for_hints_to_be_replayed storage_proxy: stop waiting for hints replay when node goes down hints: dismiss segment waiters when hint queue can't send repair: plug in waiting for hints to be sent before repair repair: add get_hosts_participating_in_repair storage_proxy: coordinate waiting for hints to be sent config: add wait_for_hint_replay_before_repair option storage_proxy: implement verbs for hint sync points messaging_service: add verbs for hint sync points storage_proxy: add functions for syncing with hints queue db/hints: make it possible to wait until current hints are sent db/hints: add a metric for counting processed files db/hints: allow to forcefully update segment list on flush	2021-05-03 18:47:27 +03:00
Piotr Dulikowski	9d68824327	hints: dismiss segment waiters when hint queue can't send When a hint queue becomes stuck due to not being able to send to its destination (e.g. destination node is no longer UP, or we failed to send some hints from a file), then it's better to immediately dismiss anybody who waits for hint replay instead of letting them wait until timeout.	2021-04-27 15:58:15 +02:00
Piotr Dulikowski	86d831b319	config: add wait_for_hint_replay_before_repair option Adds the `wait_for_hint_replay_before_repair` configuration option. If set to true, the repair coordinator will first wait until the cluster replays its hints towards the nodes participating in repair. It is set to true by default, and is live-updateable. It will be used in subsequent commits from the same PR.	2021-04-27 15:16:26 +02:00
Piotr Dulikowski	82c419870a	messaging_service: add verbs for hint sync points Adds two verbs: HINT_SYNC_POINT_CREATE and HINT_SYNC_POINT_CHECK. Those will make it possible to create a sync point and regularly poll to check its existence.	2021-04-27 15:06:39 +02:00
Piotr Dulikowski	244738b0d5	storage_proxy: add functions for syncing with hints queue Adds two methods to `storage_proxy`: - `create_hint_queue_sync_point` - creates a "hint sync point" which is kept present in storage_proxy until all hint queues on the local node reach their curent end. It will also disappear if given deadline is reached first. - `check_hint_queue_sync_point` - checks if given hint sync point still exists. The created sync point waits for hint queues in all hint managers, on all shards.	2021-04-27 15:06:39 +02:00
Piotr Dulikowski	427bbf6d86	db/hints: make it possible to wait until current hints are sent Implements `wait_until_hints_are_replayed` method returning a future which blocks until either all current hint segments are replayed (returns success in this case), or when provided timeout is reached (returns a timeout exception in this case).	2021-04-26 13:57:03 +02:00
Benny Halevy	dad6c94476	view_builder: stop: close all build_step readers Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	02d74e1530	view_update_generator: start: close staging_sstable_reader when done The staging_sstable_reader has to be closed before it's destroyed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	1e1c8ea824	view: build_progress_virtual_reader: implement close method Close underlying reader. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	2d8b00f2d8	view: generate_view_updates: close builder readers when done Make sure to close the builder's _updates and optional _existings readers before they are destroyed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	652ba714fe	view_builder: initialize_reader_at_current_token: close reader before reassigning it Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	7093610931	view_builder: do_build_step: close build_step reader when done Make sure to close the build_step reader before destroying it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	38e48bb462	size_estimates_reader: close partition_reader Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Piotr Dulikowski	5a49fe74bb	db/hints: add a metric for counting processed files Adds a field to `end_point_hints_manager::sender`: `_total_replayed_segments_count` which keeps track of how many segments were replayed so far. This metric will be used to calculate the sequence number of the last current hint segments in the queue - so that we can implement waiting for current segments to be replayed.	2021-04-22 18:45:34 +02:00
Piotr Dulikowski	e48739a6da	db/hints: allow to forcefully update segment list on flush Endpoint hints manager keeps a list of segments to replay. New segments are appended to it lazily - only when a hint flush occurs (hints commitlog instance is re-created) and the list is empty. Because of that, this list cannot be currently used to tell how many segments are on disk. This commit allows to trigger hints flush and forcefully update the list of segments to replay. In later commits, a mechanism will be implemented which will allow to wait until a given number of hint segments is replayed. Triggering a hints flush with segment list update will allow us to properly synchronize and determine up to which segment we need to wait.	2021-04-22 17:34:04 +02:00
Piotr Sarna	ad661561c8	db: stop using infinite timeout for service level updates Due to a porting bug, the routines for updating service levels used the default infinite timeout for internal CQL queries, which causes Scylla to hang on shutdown. The behavior is now fixed and the routines use the same timeout as the other similar functions - 10s at the time of writing this message.	2021-04-22 09:03:21 +02:00
Avi Kivity	daeddda7cc	treewide: remove inclusions of storage_proxy.hh from headers storage_proxy.hh is huge and includes many headers itself, so remove its inclusions from headers and re-add smaller headers where needed (and storage_proxy.hh itself in source files that need it). Ref #1.	2021-04-20 21:23:00 +03:00
Avi Kivity	14a4173f50	treewide: make headers self-sufficient In preparation for some large header changes, fix up any headers that aren't self-sufficient by adding needed includes or forward declarations.	2021-04-20 21:23:00 +03:00
Kamil Braun	617813ba66	sys_dist_ks: new keyspace for system tables with Everywhere strategy `system_distributed_everywhere` is a new keyspace that uses Everywhere replication strategy. This is useful, for example, when we want to store internal data that should be accessible by every node; the data can be written using CL=ALL (e.g. during node operations such as node bootstrap, which require all nodes to be alive - at least currently) and then read by each node locally using CL=ONE (e.g. during node restarts). Closes #8457	2021-04-19 11:22:57 +03:00
Eliran Sinvani	dd74556ad9	service/qos: adding service level table to the distributed keyspace This patch adds the service level table and functions to manipulate it to the distributed keyspace. Message-Id: <b6cb7f311ac1ee6802d8f3d78eac9cf40fe21f68.1609161341.git.sarna@scylladb.com>	2021-04-12 15:58:09 +02:00
Benny Halevy	705f9c4f79	commitlog: segment_manager: max_size must be aligned This was triggered by the test_total_space_limit_of_commitlog dtest. When it passes a very large commitlog_segment_size_in_mb (1/6th of the free memory size, in mb), segment_manager constructor limits max_size to std::numeric_limits<position_type>::max() which is 0xffffffff. This causes allocate_segment_ex to loop forever when writing the segment file since `dma_write` returns 0 when the count is unaligned (seen 4095). The fix here is to select a sligtly small maxsize that is aligned down to a multiple of 1MB. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210407121059.277912-1-bhalevy@scylladb.com>	2021-04-11 13:17:50 +03:00
Pavel Emelyanov	70c851e69b	view: Don't expect int from position_in_partition::tri_compare Now it's int, but soon will be std::strong_ordering. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-04-09 18:20:39 +03:00
Piotr Sarna	8e808a56d2	Merge 'commitlog: Fix race and edge condition in delete_segments' from Calle Wilund Fixes #8363 Fixes #8376 Delete segements has two issues when running with size-limited commit log and strict adherence to said limit. 1.) It uses parallel processing, with deferral. This means that the disk usage variables it looks at might not be fully valid - i.e. we might have already issued a file delete that will reduce disk footprint such that a segment could instead be recycled, but since vars are (and should) only updated _post_ delete, we don't know. 2.) It does not take into account edge conditions, when we only delete a single segment, and this segment is the border segment - i.e. the one pushing us over the limit, yet allocation is desperately waiting for recycling. In this case we should allow it to live on, and assume that next delete will reduce footprint. Note: to ensure exact size limit, make sure total size is a multiple of segment size. if we had an error in recycling (disk rename?), and no elements are available, we could have waiters hoping they will get segements. abort the queue (not permanent, but wakes up waiters), and let them retry. Since we did deletions instead, disk footprint should allow for new allocs at least. Or more likely, everything is broken, but we will at least make more noise. Closes #8372 * github.com:scylladb/scylla: commitlog: Add signalling to recycle queue iff we fail to recycle commitlog: Fix race and edge condition in delete_segments commitlog: coroutinize delete_segments commitlog_test: Add test for deadlock in recycle waiter	2021-04-07 15:13:25 +02:00
Nadav Har'El	0dd6f2db8f	Merge 'CDC generations: refactors and improvements' from Kamil Braun The "most important" major changes are: 1. storage_service: simplify CDC generation management during node replace Previously, when node A replaced node B, it would obtain B's generation timestamp from its application state (gossiped by other nodes) and start gossiping it immediately on bootstrap. But that's not necessary: - if this is the timestamp of the last (current) generation, we would obtain it from other nodes anyway (every node gossips the last known timestamp), - if this is the timestamp of an earlier generation, we would forget it immediately and start gossiping the last timestamp (obtained from other nodes). This commit simplifies the bootstrap code (in node-replace case) a bit: the replacing node no longer attempts to retrieve the CDC generation timestamp from the node being replaced. 2. tree-wide: introduce cdc::generation_id type Each CDC generation has a timestamp which denotes a logical point in time when this generation starts operating. That same timestamp is used to identify the CDC generation. We use this identification scheme to exchange CDC generations around the cluster. However, the fact that a generation's timestamp is used as an ID for this generation is an implementation detail of the currently used method of managing CDC generations. Places in the code that deal with the timestamp, e.g. functions which take it as an argument (such as handle_cdc_generation) are often interested in the ID aspect, not the "when does the generation start operating" aspect. They don't care that the ID is a `db_clock::time_point`. They may sometimes want to retrieve the time point given the ID (such as do_handle_cdc_generation when it calls `cdc::metadata::insert`), but they don't care about the fact that the time point actually IS the ID. In the future we may actually change the specific type of the ID if we modify the generation management algorithms. This commit is an intermediate step that will ease the transition in the future. It introduces a new type, `cdc::generation_id`. Inside it contains the timestamp, so: - if a piece of code doesn't care about the timestamp, it just passes the ID around - if it does care, it can access it using the `get_ts` function. The fact that `get_ts` simply accesses the ID's only field is an implementation detail. 3. cdc: handle missing generation case in check_and_repair_cdc_streams check_and_repair_cdc_streams assumed that there is always at least one generation being gossiped by at least one of the nodes. Otherwise it would enter undefined behavior. I'm not aware of any "real" scenario where this assumption wouldn't be satisfied at the moment where check_and_repair_cdc_streams makes it except perhaps some theoretical races. But it's best to stay on the safe side. --- Additionally the PR does some simplifications, stylistic improvements, removes some dead code, coroutinizes some functions, uncoroutinizes others (due to miscompiles), adds additional logging, updates some stale comments. Read commit messages for more details. Closes #8283 * github.com:scylladb/scylla: cdc: log a message when creating a new CDC generation cdc: handle missing generation case in check_and_repair_cdc_streams tree-wide: introduce cdc::generation_id type tree-wide: rename "cdc streams timestamp" to "cdc generation id" cdc: remove some functions from generation.hh storage_service: make set_gossip_tokens a static free-function db: system_keyspace: group cdc functions in single place cdc: get rid of "get_local_streams_timestamp" sys_dist_ks: update comment at quorum_if_many storage_service: simplify CDC generation management during node replace	2021-04-07 14:49:02 +03:00
Kamil Braun	99fd2244a3	tree-wide: introduce cdc::generation_id type This is a follow-up to the previous commit. Each CDC generation has a timestamp which denotes a logical point in time when this generation starts operating. That same timestamp is used to identify the CDC generation. We use this identification scheme to exchange CDC generations around the cluster. However, the fact that a generation's timestamp is used as an ID for this generation is an implementation detail of the currently used method of managing CDC generations. Places in the code that deal with the timestamp, e.g. functions which take it as an argument (such as handle_cdc_generation) are often interested in the ID aspect, not the "when does the generation start operating" aspect. They don't care that the ID is a `db_clock::time_point`. They may sometimes want to retrieve the time point given the ID (such as do_handle_cdc_generation when it calls `cdc::metadata::insert`), but they don't care about the fact that the time point actually IS the ID. In the future we may actually change the specific type of the ID if we modify the generation management algorithms. This commit is an intermediate step that will ease the transition in the future. It introduces a new type, `cdc::generation_id`. Inside it contains the timestamp, so: 1. if a piece of code doesn't care about the timestamp, it just passes the ID around 2. if it does care, it can simply access it using the `get_ts` function. The fact that `get_ts` simply accesses the ID's only field is an implementation detail. Using the occasion, we change the `do_handle_cdc_generation_intercept...` function to be a standard function, not a coroutine. It turns out that - depending on the shape of the passed-in argument - the function would sometimes miscompile (the compiled code would not copy the argument to the coroutine frame).	2021-04-07 13:47:13 +02:00
Avi Kivity	5109bf8b99	config: relax batch size warning and failure thresholds We inherited very low threshold for warning and failing multi-partition batches, but these warnings aren't useful. The size of a batch in bytes as no impact on node stability. In fact the warnings can cause more problems if they flood the log. Fix by raising the warning threshold to 128 kiB (our magic size) and the fail threshold to 1 MiB. Fixes #8416. Closes #8417	2021-04-06 20:56:06 +03:00
Calle Wilund	d734f85280	commitlog: Add signalling to recycle queue iff we fail to recycle Fixes #8376 If a recycle should fail, we will sort of handle it by deleting the segment, so no leaks. But if we have waiter(s) on the recycle queue, we could end up deadlocked/starved because nothing is incoming there. This adds an abort of the queue iff we failed and no objects are available. This will wake up any waiter, and he should retry, and hopefully at least be able to create a new segment. We then reset the queue to a new one. So we can go on. v2: * Forgot to reset queue v3: * Nicer exception handling in allocate_segment_ex	2021-04-06 16:38:14 +00:00
Calle Wilund	15dd76f0c2	commitlog: Fix race and edge condition in delete_segments Fixes #8363 Delete segements has two issues when running with size-limited commit log and strict adherence to said limit. 1.) It uses parallel processing, with deferral. This means that the disk usage variables it looks at might not be fully valid - i.e. we might have already issued a file delete that will reduce disk footprint such that a segment could instead be recycled, but since vars are (and should) only updated _post_ delete, we don't know. 2.) It does not take into account edge conditions, when we only delete a single segment, and this segment is the border segment - i.e. the one pushing us over the limit, yet allocation is desperately waiting for recycling. In this case we should allow it to live on, and assume that next delete will reduce footprint. Note: to ensure exact size limit, make sure total size is a multiple of segment size. Fixed by a.) Doing delete serialized. It is not like being parallel here will win us speed awards. And now we can know exact footprint, and how many segments we have left to delete b.) Check if we are a block across the footprint boundry, and people might be waiting for a segment. If so, don't delete segment, but recycle. As a follow-up, we should probably instead adjust the commitlog size limit (per shard) to be a multiple of segment sizes, but there is risks in that too.	2021-04-06 16:38:14 +00:00
Calle Wilund	d9a9897892	commitlog: coroutinize delete_segments Because we like cow routines.	2021-04-06 16:38:14 +00:00
Calle Wilund	813694b617	commitlog_test: Add test for deadlock in recycle waiter Not a very good test, mind you. Nothing to verify, just see if the test times out. But try to make it at least complete for failure report.	2021-04-06 16:38:14 +00:00
Konstantin Osipov	c83cf1f965	uuid: switch the API to use std::chrono A follow up for the patch for #7611. This change was requested during review and moved out of #7611 to reduce its scope. The patch switches UUID_gen API from using plain integers to hold time units to units from std::chrono. For one, we plan to switch the entire code base to std::chrono units, to ensure type safety. Secondly, using std::chrono units allows to increase code reuse with template metaprogramming and remove a few of UUID_gen functions that beceme redundant as a result. * switch get_time_UUID(), unix_timestamp(), get_time_UUID_raw(), switch min_time_UUID(), max_time_UUID(), create_time_safe() to std::chrono * remove unused variant of from_unix_timestamp() * remove unused get_time_UUID_bytes(), create_time_unsafe(), redundant get_adjusted_timestamp() * inline get_raw_UUID_bytes() * collapse to similar implementations of get_time_UUID() * switch internal constants to std::chrono * remove unnecessary unique_ptr from UUID_gen::_instance Message-Id: <20210406130152.3237914-2-kostja@scylladb.com>	2021-04-06 17:12:54 +03:00
Kamil Braun	e486e0f759	tree-wide: rename "cdc streams timestamp" to "cdc generation id" Each CDC generation always has a timestamp, but the fact that the timestamp identifies the generation is an implementation detail. We abstract away from this detail by using a more generic naming scheme: a generation "identifier" (whatever that is - a timestamp or something else). It's possible that a CDC generation will be identified by more than a timestamp in the (near) future. The actual string gossiped by nodes in their application state is left as "CDC_STREAMS_TIMESTAMP" for backward compatibility. Some stale comments have been updated.	2021-04-06 13:15:31 +02:00
Kamil Braun	1019ff07cb	db: system_keyspace: group cdc functions in single place	2021-04-06 13:15:31 +02:00
Kamil Braun	3cebe99613	sys_dist_ks: update comment at quorum_if_many The comment mentioned tables that no longer exist: their names have changed some time ago. Update the comment to be name-agnostic. Furthemore, the second part of the comment related to a case of "joining a node without bootstrapping". Fortunately this operation is no longer possible (after #6848 which became part of Scylla 4.3) so we can shorten the comment.	2021-04-06 13:15:31 +02:00
Avi Kivity	56cd058b34	config: correct description of listen_address - it does not support using interface names - listen_interface is not supported - 0.0.0.0 will work (and is reasonable) if you set broadcast_address - empty setting is not supported Fixes #8381. Closes #8409	2021-04-05 14:06:48 +03:00
Avi Kivity	82c76832df	treewide: don't include "db/system_distributed_keyspace.hh" from headers This just causes unneeded and slower recompliations. Instead replace with forward declarations, or includes of smaller headers that were incidentally brought in by the one removed. The .cc files that really need it gain the include, but they are few. Ref #1. Closes #8403	2021-04-04 14:00:26 +03:00
Kamil Braun	641040d465	sys_dist_ks: remove dead code (expire_cdc_* functions) These functions were not used anywhere but had to be maintained anyway. When (if) the expiration algorithm actually gets implemented (see issue #7300), the functions can be added back (perhaps they will need to look differently at that time, and it's likely that the `expire` column won't be used in the expiration algorithm in the end anyway).	2021-04-04 13:12:12 +03:00
Kamil Braun	4f3f245188	sys_dist_ks: coroutinize system_distributed_keyspace::start	2021-04-04 13:10:44 +03:00
Tomasz Grabiec	307bd354d2	Merge 'hints: use token_metadata to tell if node has left the ring' from Piotr Dulikowski This PR changes the `can_send` function so that it looks at the `token_metadata` in order to tell if the destination node is in the ring. Previously, gossiper state was used for that purpose and required a relatively complicated condition to check. The new logic just uses `token_metadata::is_member` which reduces complexity of the `can_send` function. Additionally, `storage_service` is slightly modified so that during a removenode operation the `token_metadata` is first updated and only then endpoint lifecycle subscribers are notified. This was done in order to prevent a race just like the one which happened in #5087 - hints manager is a lifecycle subscriber and starts a draining operation when a node is removed, and in order for draining to work correctly, `can_send` should keep returning true for that node. Tests: - unit(dev) - dtest(hintedhandoff_additional_test.py) - dtest(topology_test.py) Closes #8387 * github.com:scylladb/scylla: hints: clarify docstring comment for can_send hints: use token_metadata to tell if node is in the ring hints: slightly reogranize "if" statement in can_send storage_service: release token_metadata lock before notify_left storage_service: notify_left after token_metadata is replicated	2021-04-01 15:51:46 +02:00
Piotr Dulikowski	6a1152ea9b	hints: clarify docstring comment for can_send Now, the docstring comment next to can_send better represents the condition that is checked inside that function. The statement about returning true when destination left the NORMAL state is replaced with a statement about returning true when the destination has left the ring.	2021-04-01 03:58:29 +02:00
Piotr Dulikowski	4f90514247	hints: use token_metadata to tell if node is in the ring Now, instead of looking at the gossiper state to check if the destination node is still in the ring, we are using token_metadata as a source of truth. This results in much simpler code in can_send() as token_metadata has an is_member method which does exactly what we want.	2021-04-01 03:58:29 +02:00
Piotr Dulikowski	e7d9057d0c	hints: slightly reogranize "if" statement in can_send This commit reverses the order of if-else blocks in can_send, which makes it - in my opinion, at least - slightly easier to read.	2021-04-01 03:58:29 +02:00
Piotr Jastrzebski	57c7964d6c	config: ignore enable_sstables_mc_format flag Don't allow users to disable MC sstables format any more. We would like to retire some old cluster features that has been around for years. Namely MC_SSTABLE and UNBOUNDED_RANGE_TOMBSTONES. To do this we first have to make sure that all existing clusters have them enabled. It is impossible to know that unless we stop supporting enable_sstables_mc_format flag. Test: unit(dev) Refs #8352 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Closes #8360	2021-03-31 12:23:59 +03:00
Calle Wilund	c0666ea89b	commitlog: Fix inner loop condition in allocation pre-fill Fixes #8369 This was originally found (and fixed) by @gleb-cloudius, but the patch set with the fix was reverted at some point, and the fix went away. Now the error remains even in new, nice coroutine code. We check the wrong var in the inner loop of the pre-fill path of allocate_segment_ex, often causing us to generate giant writev:s of more or less the whole file. Not intended. Closes #8370	2021-03-30 12:14:55 +02:00
Nadav Har'El	ccc75bfe2a	Merge 'Disable thrift by default' from Piotr Sarna The Thrift layer is functional, but it's not usually the first-choice protocol for Scylla users, so it's hereby disabled by default. Fixes #8336 Closes #8338 * github.com:scylladb/scylla: docs: mention disabling Thrift by default db,config: disable Thrift by default	2021-03-29 12:48:20 +03:00
Piotr Wojtczak	c1daf2bb24	column_family: Make toppartitions queries more generic Right now toppartitions can only be invoked on one column family at a time. This change introduces a natural extension to this functionality, allowing to specify a list of families. We provide three ways for filtering in the query parameter "name_list": 1. A specific column family to include in the form "ks:cf" 2. A keyspace, telling the server to include all column families in it. Specified by omitting the cf name, i.e. "ks:" 3. All column families, which is represented by an empty list The list can include any amount of one or both of the 1. and 2. option. Fixes #4520 Closes #7864	2021-03-24 17:54:05 +02:00
Pavel Emelyanov	37bec6fb76	commitlog: Open files with append_is_unlikely This open option tells seastar that the file in question will be truncated to the needed size right at once and all the subsequent writes will happen within this size. This hint turns off append optimization in seastar that's not that cheap and helps so save few cpu cycles. The option was introduced in seastar by 8bec57bc. tests: unit(dev), dtest(commitlog: test_batch_commitlog, test_periodic_commitlog, test_commitlog_replay_on_startup) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210323115409.31215-1-xemul@scylladb.com>	2021-03-24 13:05:33 +02:00
Piotr Sarna	e2443337d9	db,config: disable Thrift by default It will still be possible to use Thrift once it's enabled in the yaml file, but it's better to not open this port by default, since Thrift is definitely not the first choice for Scylla users. Fixes #8336	2021-03-22 10:54:26 +01:00
Avi Kivity	972ea9900c	Merge 'commitlog: Make pre-allocation drop O_DSYNC while pre-filling' from Calle Wilund Refs #7794 Iff we need to pre-fill segment file ni O_DSYNC mode, we should drop this for the pre-fill, to avoid issuing flushes until the file is filled. Done by temporarily closing, re-opening in "normal" mode, filling, then re-opening. Closes #8250 * github.com:scylladb/scylla: commitlog: Make pre-allocation drop O_DSYNC while pre-filling commitlog: coroutinize allocate_segment_ex	2021-03-17 09:59:22 +02:00
Calle Wilund	48ca01c3ab	commitlog: Make pre-allocation drop O_DSYNC while pre-filling Refs #7794 Iff we need to pre-fill segment file ni O_DSYNC mode, we should drop this for the pre-fill, to avoid issuing flushes until the file is filled. Done by temporarily closing, re-opening in "normal" mode, filling, then re-opening. v2: * More comment v3: * Add missing flush v4: * comment v5: * Split coroutine and fix into separate patches	2021-03-15 09:35:45 +00:00

... 58 59 60 61 62 ...

4972 Commits