scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 03:56:42 +00:00

Author	SHA1	Message	Date
Avi Kivity	6898fcd40f	Update seastar submodule for precalculated TLS DH parameters * seastar d4df4fa6de...9d8d82a095 (1): > TLS: Use "known" (precalculated) DH parameters if available Fixes #6191.	2020-11-29 14:36:40 +02:00
Asias He	4df08e331b	repair: Make repair_writer a shared pointer The future of the fiber that writes data into sstables inside the repair_writer is stored in _writer_done like below: class repair_writer { _writer_done[node_idx] = mutation_writer::distribute_reader_and_consume_on_shards().then([this] { ... }).handle_exception([this] { ... }); } The fiber access repair_writer object in the error handling path. We wait for the _writer_done to finish before we destroy repair_meta object which contains the repair_writer object to avoid the fiber accessing already freed repair_writer object. To be safer, we can make repair_writer a shared pointer and take a reference in the distribute_reader_and_consume_on_shards code path. Fixes #7406 Closes #7430 (cherry picked from commit `289a08072a`)	2020-11-29 13:30:06 +02:00
Pavel Emelyanov	7b1fb86a28	query_pager: Fix continuation handling for noop visitor Before updating the _last_[cp]key (for subsequent .fetch_page()) the pager checks is 'if the pager is not exhausted OR the result has data'. The check seems broken: if the pager is not exhausted, but the result is empty the call for keys will unconditionally try to reference the last element from empty vector. The not exhausted condition for empty result can happen if the short_read is set, which, in turn, unconditionally happens upon meeting partition end when visiting the partition with result builder. The correct check should be 'if the pager is not exhausted AND the result has data': the _last_[pc]key-s should be taken for continuation (not exhausted), but can be taken if the result is not empty (has data). fixes: #7263 tests: unit(dev), but tests don't trigger this corner case Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200921124329.21209-1-xemul@scylladb.com> (cherry picked from commit `550fc734d9`)	2020-11-29 12:01:43 +02:00
Takuya ASADA	f7be22ccb2	install.sh: set PATH for relocatable CLI tools in python thunk We currently set PATH for relocatable CLI tools in scylla_util.run() and scylla_util.out(), but it doesn't work for perftune.py, since it's not part of Scylla, does not use scylla_util module. We can set PATH in python thunk instead, it can set PATH for all python scripts. Fixes #7350 (cherry picked from commit `5867af4edd`)	2020-11-29 11:54:53 +02:00
Bentsi Magidovich	26b5a34f96	scylla_util.py: fix exception handling in curl Retry mechanism didn't work when URLError happend. For example: urllib.error.URLError: <urlopen error [Errno 101] Network is unreachable> Let's catch URLError instead of HTTP since URLError is a base exception for all exceptions in the urllib module. Fixes: #7569 Closes #7567 (cherry picked from commit `956b97b2a8`)	2020-11-29 11:48:42 +02:00
Takuya ASADA	10a65ba2fb	dist/redhat: packaging dependencies.conf as normal file, not ghost When we introduced dependencies.conf, we mistakenly added it on rpm as %ghost, but it should be normal file, should be installed normally on package installation. Fixes #7703 Closes #7704 (cherry picked from commit `ba4d54efa3`)	2020-11-29 11:40:27 +02:00
Takuya ASADA	be60e3ca52	install.sh: apply sysctl.d files on non-packaging installation We don't apply sysctl.d files on non-packaging installation, apply them just like rpm/deb taking care of that. Fixes #7702 Closes #7705 (cherry picked from commit `5f81f97773`)	2020-11-29 11:35:51 +02:00
Avi Kivity	5485c902fe	dist: sysctl: configure more inotify instances Since `f3bcd4d205` ("Merge 'Support SSL Certificate Hot Reloading' from Calle"), we reload certificates as they are modified on disk. This uses inotify, which is limited by a sysctl fs.inotify.max_user_instances, with a default of 128. This is enough for 64 shards only, if both rpc and cql are encrypted; above that startup fails. Increase to 1200, which is enough for 6 instances * 200 shards. Fixes #7700. Closes #7701 (cherry picked from commit `390e07d591`)	2020-11-29 11:04:57 +02:00
Hagit Segev	01c822301f	release: prepare for 4.1.10	2020-11-19 18:07:49 +02:00
Raphael S. Carvalho	415b271a39	compaction: Make sure a partition is filtered out only by producer If interposer consumer is enabled, partition filtering will be done by the consumer instead, but that's not possible because only the producer is able to skip to the next partition if the current one is filtered out, so scylla crashes when that happens with a bad function call in queue_reader. This is a regression which started here: `55a8b6e3c9` To fix this problem, let's make sure that partition filtering will only happen on the producer side. Fixes #7590. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20201111221513.312283-1-raphaelsc@scylladb.com> (cherry picked from commit `13fa2bec4c`)	2020-11-19 14:08:47 +02:00
Piotr Dulikowski	b7274ab44a	hints: don't read hint files when it's not allowed to send When there are hint files to be sent and the target endpoint is DOWN, end_point_hints_manager works in the following loop: - It reads the first hint file in the queue, - For each hint in the file it decides that it won't be sent because the target endpoint is DOWN, - After realizing that there are some unsent hints, it decides to retry this operation after sleeping 1 second. This causes the first segment to be wholly read over and over again, with 1 second pauses, until the target endpoint becomes UP or leaves the cluster. This causes unnecessary I/O load in the streaming scheduling group. This patch adds a check which prevents end_point_hints_manager from reading the first hint file at all when it is not allowed to send hints. First observed in #6964 Tests: - unit(dev) - hinted handoff dtests Closes #7407 (cherry picked from commit `77a0f1a153`)	2020-11-16 14:30:26 +02:00
Botond Dénes	b144b93cd8	mutation_reader: queue_reader: don't set EOS flag on abort If the consumer happens to check the EOS flag before it hits the exception injected by the abort (by calling fill_buffer()), they can think the stream ended normally and expect it to be valid. However this is not guaranteed when the reader is aborted. To avoid consumers falsely thinking the stream ended normally, don't set the EOS flag on abort at all. Additionally make sure the producer is aborted too on abort. In theory this is not needed as they are the one initiating the abort, but better to be safe then sorry. Fixes: #7411 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20201102100732.35132-1-bdenes@scylladb.com> (cherry picked from commit `f5323b29d9`)	2020-11-15 11:08:07 +02:00
Botond Dénes	7325996510	types: validate(): linearize values lazily Instead of eagerly linearizing all values as they are passed to validate(), defer linearization to those validators that actually need linearized values. Linearizing large values puts pressure on the memory allocator with large contiguous allocation requests. This is something we are trying to actively avoid, especially if it is not really neaded. Turns out the types, whose validators really want linearized values are a minority, as most validators just look at the size of the value, and some like bytes don't need validation at all, while usually having large values. This is achieved by templating the validator struct on the view and using the FragmentedRange concept to treat all passed in views (`bytes_view` and `fragmented_temporary_buffer_view`) uniformly. This patch makes no attempt at converting existing validators to work with fragmented buffers, only trivial cases are converted. The major offenders still left are ascii/utf8 and collections. Fixes: #7318 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20201007054524.909420-1-bdenes@scylladb.com> (cherry picked from commit `db56ae695c`) [avi: squashed `ed6775c585` ("types: adjust validation_visitor construction for clang") as gcc 9 in scylla 4.1 suffers from the same problem as clang 11]	2020-11-11 12:31:36 +02:00
Piotr Sarna	fb14fae79b	Merge 'Backport PR #7469 to 4.2' from Eliran Sinvani This is a backport of PR #7469 that did not apply cleanly to 4.2 with a trivial conflict, another commit that touched one of the files but in a completely different region. Closes #7480 * github.com:scylladb/scylla: materialized views: add a base table reference if missing view info: support partial match between base and view for only reading from view. view info: guard against null dereference of the base info (cherry picked from commit `c74ba1bc36`)	2020-11-09 15:22:11 +02:00
Avi Kivity	bb49a5ac06	Merge 'storage_proxy: add a separate smp_group for hints' from Eliran Hints writes are handled by storage_proxy in the exact same way regular writes are, which in turn means that the same smp service group is used for both. The problem is that it can lead to a priority inversion where writes of the lower priority kind occupies a lot of the semaphores units making the higher priority writes wait for an empty slot. This series adds a separate smp group for hints as well as a field to pass the correct smp group to mutate_locally functions, and then uses this field to properly classify the writes. Fixes #7177 * eliransin-hint_priority_inversion: Storage proxy: use hints smp group in mutate locally Storage proxy: add a dedicated smp group for hints (cherry picked from commit `c075539fea`) [avi: replace std::bind_front() which is not available with this compiler with a lambda that does the same]	2020-11-08 20:46:45 +02:00
Pavel Solodovnikov	947d3a13a3	storage_proxy: un-hardcode force sync flag for `mutate_locally(mutation)` overload Corresponding overload of `storage_proxy::mutate_locally` was hardcoded to pass `db::commitlog::force_sync::no` to the `database::apply`. Unhardcode it and substitute `force_sync::no` to all existing call sites (as it were before). `force_sync::yes` will be used later for paxos learn writes when trying to apply mutations upgraded from an obsolete schema version (similar to the current case when applying locally a `frozen_mutation` stored in accepted proposal). Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200716124915.464789-1-pa.solodovnikov@scylladb.com> (cherry picked from commit `5ff5df1afd`) Prerequisite for #7177.	2020-11-08 19:47:11 +02:00
Amnon Heiman	b096d64aa7	scyllatop/livedata.py: Safe iteration over metrics This patch change the code that iterates over the metrics to use a copy of the metrics names to make it safe to remove the metrics from the metrics object. Fixes #7488 Signed-off-by: Amnon Heiman <amnon@scylladb.com> (cherry picked from commit `52db99f25f`)	2020-11-08 19:16:25 +02:00
Calle Wilund	ce8a0f3886	partition_version: Change range_tombstones() to return chunked_vector Refs #7364 The number of tombstones can be large. As a stopgap measure to just returning a source range (with keepalive), we can at least alleviate the problem by using a chunked vector. Closes #7433 (cherry picked from commit `4b65d67a1a`)	2020-11-08 14:38:45 +02:00
Tomasz Grabiec	41344d8ee6	sstables: ka/la: Fix abort when next_partition() is called with certain reader state Cleanup compaction is using consume_pausable_in_thread() to skip over disowned partitions, which uses flat_mutation_reader::next_partition(). The implementation of next_partition() for the sstable reader has a bug which may cause the following assertion failure: scylla: sstables/mp_row_consumer.hh:422: row_consumer::proceed sstables::mp_row_consumer_k_l::flush(): Assertion `!_ready' failed. This happens when the sstable reader's buffer gets full when we reach the partition end. The last fragment of the partition won't be pushed into the buffer but will stay in the _ready variable. When next_partition() is called in this state, _ready will not be cleared and the fragment will be carried over to the next partition. This will cause assertion failure when the reader attempts to emit the first fragment of the next partition. The fix is to clear _ready when entering a partition, just like we clear _range_tombstones there. Fixes #7553. Message-Id: <1604534702-12777-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `fb9b5cae05`)	2020-11-08 14:32:58 +02:00
Avi Kivity	db6303dba0	Merge "Fix TWCS compaction aggressiveness due to data segregation" from Raphael " After data segregation feature, anything that cause out-of-order writes, like read repair, can result in small updates to past time windows. This causes compaction to be very aggressive because whenever a past time window is updated like that, that time window is recompacted into a single SSTable. Users expect that once a window is closed, it will no longer be written to, but that has changed since the introduction of the data segregation future. We didn't anticipate the write amplification issues that the feature would cause. To fix this problem, let's perform size-tiered compaction on the windows that are no longer active and were updated because data was segregated. The current behavior where the last active window is merged into one file is kept. But thereafter, that same window will only be compacted using STCS. Fixes #6928. " * 'fix_twcs_agressiveness_after_data_segregation_v2' of github.com:raphaelsc/scylla: compaction/twcs: improve further debug messages compaction/twcs: Improve debug log which shows all windows test: Check that TWCS properly performs size-tiered compaction on past windows compaction/twcs: Make task estimation take into account the size-tiered behavior compaction/stcs: Export static function that estimates pending tasks compaction/stcs: Make get_buckets() static compact/twcs: Perform size-tiered compaction on past time windows compaction/twcs: Make strategy easier to extend by removing duplicated knowledge compaction/twcs: Make newest_bucket() non-static compaction/twcs: Move TWCS implementation into source file (cherry picked from commit `6f986df458`)	2020-11-05 20:32:42 +02:00
Glauber Costa	964cbb95a7	twcs: move implementations to its own file LCS and SCTS already have their own files, reducing the clutter in compaction_strategy.cc. Do the same for TWCS. I am doing this in preparation to add more functions. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200611230906.409023-6-glauber@scylladb.com> (cherry picked from commit `b0a0c207c3`) Prerequisite for #6928.	2020-11-05 20:20:30 +02:00
Avi Kivity	b34a1d9576	Merge 'Move temporaries to value view' from Piotr S " Issue https://github.com/scylladb/scylla/issues/7019 describes a problem of an ever-growing map of temporary values stored in query_options. In order to mitigate this kind of problems, the storage for temporary values is moved from an external data structure to the value views itself. This way, the temporary lives only as long as it's accessible and is automatically destroyed once a request finishes. The downside is that each temporary is now allocated separately, while previously they were bundled in a single byte stream. Tests: unit(dev) Fixes https://github.com/scylladb/scylla/issues/7019 " `7055297649` ("cql3: remove query_options::linearize and _temporaries") is reverted from this backport since linearize() is still used in this branch. * psarna-move_temporaries_to_value_view: cql3: remove query_options::linearize and _temporaries cql3: remove make_temporary helper function cql3: store temporaries in-place instead of in query_options cql3: add temporary_value to value view cql3: allow moving data out of raw_value cql3: split values.hh into a .cc file (cherry picked from commit `2b308a973f`)	2020-11-05 19:48:01 +02:00
Piotr Sarna	15ef930268	schema_tables: fix fixing old secondary index schemas Old secondary index schemas did not have their idx_token column marked as computed, and there already exists code which updates them. Unfortunately, the fix itself contains an error and doesn't fire if computed columns are not yet supported by the whole cluster, which is a very common situation during upgrades. Fixes #7515 Closes #7516 (cherry picked from commit `b66c285f94`)	2020-11-05 17:53:28 +02:00
Avi Kivity	fe57128fe0	Merge 'Fix ignoring cells after null in appending hash' from Piotr Sarna " This series fixes a bug in `appending_hash<row>` that caused it to ignore any cells after the first NULL. It also adds a cluster feature which starts using the new hashing only after the whole cluster is aware of it. The series comes with tests, which reproduce the issue. Fixes #4567 Based on #4574 " * psarna-fix_ignoring_cells_after_null_in_appending_hash: test: extend mutation_test for NULL values tests/mutation: add reproducer for #4567 gms: add a cluster feature for fixed hashing digest: add null values to row digest mutation_partition: fix formatting appending_hash<row>: make publicly visible (cherry picked from commit `0e03c979d2`)	2020-11-04 20:45:06 +02:00
Yaron Kaikov	b80dab6d58	release: prepare for 4.1.9 scylla-4.1.9	2020-10-26 18:13:22 +02:00
Botond Dénes	04d52631b2	reader_permit: reader_resources: make true RAII class Currently in all cases we first deduct the to-be-consumed resources, then construct the `reader_resources` class to protect it (release it on destruction). This is error prone as it relies on no exception being thrown while constructing the `reader_resources`. Albeit the `reader_resources` constructor is `noexcept` right now this might change in the future and as the call sites relying on this are disconnected from the declaration, the one modifying them might not notice. To make this safe going forward, make the `reader_resources` a true RAII class, consuming the units in its constructor and releasing them in its destructor. Refs: #7256 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200922150625.1253798-1-bdenes@scylladb.com> (cherry picked from commit `a0107ba1c6`) Message-Id: <20200924081408.236353-1-bdenes@scylladb.com>	2020-10-19 15:04:53 +03:00
Takuya ASADA	dfc9f789cf	install.sh: set LC_ALL=en_US.UTF-8 on python3 thunk scylla-python3 causes segfault when non-default locale specified. As workaround for this, we need to set LC_ALL=en_US.UTF_8 on python3 thunk. Fixes #7408 Closes #7414 (cherry picked from commit `ff129ee030`)	2020-10-18 15:02:46 +03:00
Avi Kivity	c1236c02df	Update seastar submodule * seastar 88b6f0172c...d4df4fa6de (1): > append_challenged_posix_file_impl: allow destructing file with no queued work Fixes #7285.	2020-10-12 15:13:17 +03:00
Gleb Natapov	0eb2f5c378	lwt: do not return unavailable exception from the 'learn' stage Unavailable exception means that operation was not started and it can be retried safely. If lwt fails in the learn stage though it most certainly means that its effect will be observable already. The patch returns timeout exception instead which means uncertainty. Fixes #7258 Message-Id: <20201001130724.GA2283830@scylladb.com> (cherry picked from commit `3e8dbb3c09`)	2020-10-07 11:00:08 +02:00
Avi Kivity	0cc6d41ee6	Merge "materialized views: Fix undefined behavior on base table schema changes" from Tomasz " The view_info object, which is attached to the schema object of the view, contains a data structure called "base_non_pk_columns_in_view_pk". This data structure contains column ids of the base table so is valid only for a particular version of the base table schema. This data structure is used by materialized view code to interpret mutations of the base table, those coming from base table writes, or reads of the base table done as part of view updates or view building. The base table schema version of that data structure must match the schema version of the mutation fragments, otherwise we hit undefined behavior. This may include aborts, exceptions, segfaults, or data corruption (e.g. writes landing in the wrong column in the view). Before this patch, we could get schema version mismatch here after the base table was altered. That's because the view schema did not change when the base table was altered. Another problem was that view building was using the current table's schema to interpret the fragments and invoke view building. That's incorrect for two reasons. First, fragments generated by a reader must be accessed only using the reader's schema. Second, base_non_pk_columns_in_view_pk of the recorded view ptrs may not longer match the current base table schema, which is used to generate the view updates. Part of the fix is to extract base_non_pk_columns_in_view_pk into a third entity called base_dependent_view_info, which changes both on base table schema changes and view schema changes. It is managed by a shared pointer so that we can take immutable snapshots of it, just like with schema_ptr. When starting the view update, the base table schema_ptr and the corresponding base_dependent_view_info have to match. So we must obtain them atomically, and base_dependent_view_info cannot change during update. Also, whenever the base table schema changes, we must update base_dependent_view_infos of all attached views (atomically) so that it matches the base table schema. Fixes #7061. Tests: - unit (dev) - [v1] manual (reproduced using scylla binary and cqlsh) " * tag 'mv-schema-mismatch-fix-v2' of github.com:tgrabiec/scylla: db: view: Refactor view_info::initialize_base_dependent_fields() tests: mv: Test dropping columns from base table db: view: Fix incorrect schema access during view building after base table schema changes schema: Call on_internal_error() when out of range id is passed to column_at() db: views: Fix undefined behavior on base table schema changes db: views: Introduce has_base_non_pk_columns_in_view_pk() (cherry picked from commit `3daa49f098`)	2020-10-06 16:49:08 +03:00
Juliusz Stasiewicz	1ecc447f42	tracing: Fix error on slow batches `trace_keyspace_helper::make_slow_query_mutation_data` expected a "query" key in its parameters, which does not appear in case of e.g. batches of prepared statements. This is example of failing `record.parameters`: ``` ...{"query[0]" : "INSERT INTO ks.tbl (pk, i) values (?, ?);"}, {"query[1]" : "INSERT INTO ks.tbl (pk, i) values (?, ?);"}... ``` In such case Scylla recorded no trace and said: ``` ERROR 2020-09-28 10:09:36,696 [shard 3] trace_keyspace_helper - No "query" parameter set for a session requesting a slow_query_log record ``` Fix here is to leave query empty if not found. The users can still retrieve the query contents from existing info. Fixes #5843 Closes #7293 (cherry picked from commit `0afa738a8f`)	2020-10-04 18:04:42 +03:00
Tomasz Grabiec	7f3ffbc1c8	Merge "evictable_reader: validate buffer on reader recreation" from Botond This series backports the evictable reader validation patchset (merged as `97c99ea9f` to master) to 4.1. I only had to do changes to the tests. Tests: unit(dev), some exception safety tests are failing with or without my patchset * https://github.com/denesb/scylla.git denesb/evictable-reader-validate-buffer/backport-4.1: mutation_reader_test: add unit test for evictable reader self-validation evictable_reader: validate buffer after recreation the underlying evictable_reader: update_next_position(): only use peek'd position on partition boundary mutation_reader_test: add unit test for evictable reader range tombstone trimming evictable_reader: trim range tombstones to the read clustering range position_in_partition_view: add position_in_partition_view before_key() overload flat_mutation_reader: add buffer() accessor	2020-10-02 11:50:29 +02:00
Botond Dénes	6a02d120ec	mutation_reader_test: add unit test for evictable reader self-validation Add both positive (where the validation should succeed) and negative (where the validation should fail) tests, covering all validation cases. (cherry picked from commit `076c27318b`)	2020-10-02 09:45:20 +03:00
Botond Dénes	d820997452	evictable_reader: validate buffer after recreation the underlying The reader recreation mechanism is a very delicate and error-prone one, as proven by the countless bugs it had. Most of these bugs were related to the recreated reader not continuing the read from the expected position, inserting out-of-order fragments into the stream. This patch adds a defense mechanism against such bugs by validating the start position of the recreated reader. Several things are checked: * The partition is the expected one -- the one we were in the middle of or the next if we stopped at partition boundaries. * The partition is in the read range. * The first fragment in the partition is the expected one -- has a an equal or larger position than the next expected fragment. * The fragment is in the clustering range as defined by the slice. As these validations are only done on the slow-path of recreating an evicted reader, no performance impact is expected. (cherry picked from commit `0b0ae18a14`)	2020-10-02 09:38:04 +03:00
Botond Dénes	e1e57d224b	evictable_reader: update_next_position(): only use peek'd position on partition boundary `evictable_reader::update_next_position()` is used to record the position the reader will continue from, in the next buffer fill. This position is used to create the partition slice when the underlying reader is evicted and has to be recreated. There is an optimization in this method -- if the underlying's buffer is not empty we peek at the first fragment in it and use it as the next position. This is however problematic for buffer validation on reader recreation (introduced in the next patch), because using the next row's position as the next pos will allow for range tombstones to be emitted with before_key(next_pos.key()), which will trigger the validation. Instead of working around this, just drop this optimization for mid-partition positions, it is inconsequential anyway. We keep it for where it is important, when we detect that we are at a partition boundary. In this case we can avoid reading the current partition altogether when recreating the reader. (cherry picked from commit `91020eef73`)	2020-10-02 09:38:04 +03:00
Botond Dénes	763e063356	mutation_reader_test: add unit test for evictable reader range tombstone trimming (cherry picked from commit `d1b0573e1c`)	2020-10-02 09:37:57 +03:00
Botond Dénes	a8f966aafa	evictable_reader: trim range tombstones to the read clustering range Currently mutation sources are allowed to emit range tombstones that are out-of the clustering read range if they are relevant to it. For example a read of a clustering range [ck100, +inf), might start with: range_tombstone{start={ck1, -1}, end={ck200, 1}}, clustering_row{ck100} The range tombstone is relevant to the range and the first row of the range so it is emitted as first, but its position (start) is outside the read range. This is normally fine, but it poses a problem for evictable reader. When the underlying reader is evicted and has to be recreated from a certain clustering position, this results in out-of-order mutation fragments being inserted into the middle of the stream. This is not fine anymore as the monotonicity guarantee of the stream is violated. The real solution would be to require all mutation sources to trim range tombstones to their read range, but this is a lot of work. Until that is done, as a workaround we do this trimming in the evictable reader itself. (cherry picked from commit `4f2e7a18e2`)	2020-10-02 08:59:55 +03:00
Botond Dénes	1a3c8a0ec5	position_in_partition_view: add position_in_partition_view before_key() overload (cherry picked from commit `d7d93aef49`)	2020-10-02 08:59:55 +03:00
Botond Dénes	268821223c	flat_mutation_reader: add buffer() accessor To allow outsiders to inspect the contents of the reader's buffer. (cherry picked from commit `ab59e7c725`)	2020-10-02 08:59:55 +03:00
Tomasz Grabiec	6c43a0dc29	schema: Fix race in schema version recalculation leading to stale schema version in gossip Migration manager installs several feature change listeners: if (this_shard_id() == 0) { _feature_listeners.push_back(_feat.cluster_supports_view_virtual_columns().when_enabled(update_schema)); _feature_listeners.push_back(_feat.cluster_supports_digest_insensitive_to_expiry().when_enabled(update_schema)); _feature_listeners.push_back(_feat.cluster_supports_cdc().when_enabled(update_schema)); _feature_listeners.push_back(_feat.cluster_supports_per_table_partitioners().when_enabled(update_schema)); } They will call update_schema_version_and_announce() when features are enabled, which does this: return update_schema_version(proxy, features).then([] (utils::UUID uuid) { return announce_schema_version(uuid); }); So it first updates the schema version and then publishes it via gossip in announce_schema_version(). It is possible that the announce_schema_version() part of the first schema change will be deferred and will execute after the other four calls to update_schema_version_and_announce(). It will install the old schema version in gossip instead of the more recent one. The fix is to serialize schema digest calculation and publishing. Fixes #7200 (cherry picked from commit `1a57d641d1`) scylla-4.1.8	2020-10-01 18:18:21 +02:00
Yaron Kaikov	8399aac6bc	release: prepare for 4.1.8	2020-09-28 20:25:06 +03:00
Avi Kivity	b1a70d0ad4	Update sesatar submodule * seastar 15cd93729f...88b6f0172c (1): > lz4_fragmented_compressor: Fix buffer requirements Fixes #6925.	2020-09-23 11:55:54 +03:00
Yaron Kaikov	2251a1c577	release: prepare for 4.1.7 scylla-4.1.7	2020-09-17 21:30:34 +03:00
Nadav Har'El	f8c7c485d2	alternator: fix corruption of PutItem operation in case of contention This patch fixes a bug noted in issue #7218 - where PutItem operations sometimes lose part of the item's data - some attributes were lost, and the name of other attributes replaced by empty strings. The problem happened when the write-isolation policy was LWT and there was contention of writes to the same partition (not necessarily the same item). To use CAS (a.k.a. LWT), Alternator builds an alternator::rmw_operation object with an apply() function which takes the old contents of the item (if needed) and a timestamp, and builds a mutation that the CAS should apply. In the case of the PutItem operation, we wrongly assumed that apply() will be called only once - so as an optimization the strings saved in the put_item_operation were moved into the returned mutation. But this optimization is wrong - when there is contention, apply() may be called again when the changed proposed by the previous one was not accepted by the Paxos protocol. The fix is to change the one place where put_item_operation moved strings out of the saved operations into the mutations, to be a copy. But to prevent this sort of bug from reoccuring in future code, this patch enlists the compiler to help us verify that it can't happen: The apply() function is marked "const" - it can use the information in the operation to build the mutation, but it can never modify this information or move things out of it, so it will be fine to call this function twice. The single output field that apply() does write (_return_attributes) is marked "mutable" to allow the const apply() to write to it anyway. Because apply() might be called twice, it is important that if some apply() implementation sometimes sets _return_attributes, then it must always set it (even if to the default, empty, value) on every call to apply(). The const apply() means that the compiler verfies for us that I didn't forget to fix additional wrong std::move()s. Additionally, a test I wrote to easily reproduce issue #7218 (which I will submit as a dtest later) passes after this fix. Fixes #7218. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200916064906.333420-1-nyh@scylladb.com> (cherry picked from commit `5e8bdf6877`)	2020-09-16 21:26:59 +03:00
Benny Halevy	d60bed1953	test: cql_query_test: test_cache_bypass: use table stats test is currently flaky since system reads can happen in the background and disturb the global row cache stats. Use the table's row_cache stats instead. Fixes #6773 Test: cql_query_test.test_cache_bypass(dev, debug) Credit-to: Botond Dénes <bdenes@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200811140521.421813-1-bhalevy@scylladb.com> (cherry picked from commit `6deba1d0b4`)	2020-09-16 18:19:30 +03:00
Dejan Mircevski	259203a394	cql3: Fix NULL reference in get_column_defs_for_filtering There was a typo in get_column_defs_for_filtering(): it checked the wrong pointer before dereferencing. Add a test exposing the NULL dereference and fix the typo. Tests: unit (dev) Fixes #7198. Signed-off-by: Dejan Mircevski <dejan@scylladb.com> (cherry picked from commit `9d02f10c71`)	2020-09-16 15:47:04 +03:00
Avi Kivity	5f284633d4	reconcilable_result_builder: don't aggrevate out-of-memory condition during recovery Consider an unpaged query that consumes all of available memory, despite `fea5067dfa` which limits them (perhaps the user raised the limit, or this is a system query). Eventually we will see a bad_alloc which will abort the query and destroy this reconcilable_result_builder. During destruction, we first destroy _memory_accounter, and then _result. Destroying _memory_accounter resumes some continuations which can then allocate memory synchronously when increasing the task queue to accomodate them. We will then crash. Had we not crashed, we would immediately afterwards release _result, freeing all the memory that we would ever need. Fix by making _result the last member, so it is freed first. Fixes #7240. (cherry picked from commit `9421cfded4`)	2020-09-16 15:40:58 +03:00
Asias He	66cc4be8f6	storage_service: Fix a TOKENS update race for replace operation In commit `7d86a3b208` (storage_service: Make replacing node take writes), application state of TOKENS of the replacing node is added into gossip and propagated to the cluster after the initial start of gossip service. This can cause a race below 1. The replacing node replaces the old dead node with the same ip address 2. The replacing node starts gossip without application state of the TOKENS 3. Other nodes in the cluster replace the application states of old dead node's version with the new replacing node's version 4. replacing node dies 5. replace operation is performed again, the TOKENS application state is not preset and replace operation fails. To fix, we can always add TOKENS application state when the gossip service starts. Fixes: #7166 Backports: 4.1 and 4.2 (cherry picked from commit `3ba6e3d264`)	2020-09-10 13:13:58 +03:00
Avi Kivity	9ca6aa5535	Merge "Fix repair stalls in get_sync_boundary and apply_rows_on_master_in_thread" from Asias " This path set fixes stalls in repair that are caused by std::list merge and clear operations during test_latency_read_with_nemesis test. Fixes #6940 Fixes #6975 Fixes #6976 " * 'fix_repair_list_stall_merge_clear_v2' of github.com:asias/scylla: repair: Fix stall in apply_rows_on_master_in_thread and apply_rows_on_follower repair: Use clear_gently in get_sync_boundary to avoid stall utils: Add clear_gently repair: Use merge_to_gently to merge two lists utils: Add merge_to_gently (cherry picked from commit `4547949420`)	2020-09-10 13:13:54 +03:00
Avi Kivity	6e63db8c72	repair: apply_rows_on_follower(): remove copy of repair_rows list We copy a list, which was reported to generate a 15ms stall. This is easily fixed by moving it instead, which is safe since this is the last use of the variable. Fixes #7115. (cherry picked from commit `6ff12b7f79`)	2020-09-10 11:53:29 +03:00

1 2 3 4 5 ...

22177 Commits