scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 11:10:40 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	df4b698309	Merge "Add more defenses against empty keys" from Botond In theory we shouldn't have empty keys in the database, as we validate all keys that enter the database via CQL with `validation::validate_cql_keys()`, which will reject empty keys. In this context, empty means a single-component key, with its only component being empty. Yet recently we've seen empty keys appear in a cluster and wreak havoc on it, as they will cause the memtable flush to fail due to the sstable summary rejecting the empty key. This will cause an infinite loop, where Scylla keeps retrying to flush the memtable and failing. The intermediate consequence of this is that the node cannot be shut down gracefully. The indirect consequence is possible data loss, as commitlog files cannot be replayed as they just re-insert the empty key into the memtable and the infinite flush retry circle starts all over again. A workaround is to move problematic commitlog files away, allowing the node to start up. This can however lead to data loss, if multiple replicas had to move away commitlogs that contain the same data. To prevent the node getting into an unusable state and subsequent data loss, extend the existing defenses against invalid (empty) keys to the commitlog replay, which will now ignore them during replay. Fixes: #6106 * denesb/empty-keys/v5: commitlog_replayer: ignore entries with invalid keys test: lib/sstable_utils: add make_keys_for_shard validation: add is_cql_key_invalid() validation: validate_cql_key(): make key parameter a `partition_key_view` partition_key_view: add validate method	2020-05-12 20:36:40 +02:00
Piotr Dulikowski	0c5ac0da98	hinted handoff: remove discarded hint positions from rps_set Related commit: `85d5c3d` When attempting to send a hint, an exception might occur that results in that hint being discarded (e.g. keyspace or table of the hint was removed). When such an exception is thrown, position of the hint will already be stored in rps_set. We are only allowed to retain positions of hints that failed to be sent and needed to be retried later. Dropping a hint is not an error, therefore its position should be removed from rps_set - but current logic does not do that. Because of that bug, hint files with many discardable hints might cause rps_set to grow large when the file is replayed. Furthermore, leaving positions of such hints in rps_set might cause more hints than necessary to be re-sent if some non-discarded hints fail to be sent. This commit fixes the problem by removing positions of discarded hints from rps_set. Fixes #6433	2020-05-12 15:13:59 +02:00
Botond Dénes	6083ed668b	commitlog_replayer: ignore entries with invalid keys When replaying the commitlog, pass keys to `validation::validate_cql_key()`. Discard entries which fail validation and warn about it in the logs. This prevents invalid keys from getting into the system, possibly failing the commitlog replay and the successful boot of the node, preventing the node from recovering data.	2020-05-12 12:07:21 +03:00
Piotr Dulikowski	85d5c3d5ee	hinted handoff: don't keep positions of old hints in rps_set When sending hints from one file, rps_set field in send_one_file_ctx keeps track of commitlog positions of hints that are being currently sent, or have failed to be sent. At the end of the operation, if sending of some hints failed, we will choose position of the earliest hint that failed to be sent, and will retry sending that file later, starting from that position. This position is stored in _last_not_complete_rp. Usually, this set has a bounded size, because we impose a limit of at most 128 hints being sent concurrently. Because we do not attempt to send any more hints after a failure is detected, rps_set should not have more than 128 elements at a time. Due to a bug, commitlog positions of old hints (older than gc_grace_seconds of the destination table) were inserted into rps_set but not removed after checking their age. This could cause rps_set to grow very large when replaying a file with old hints. Moreover, if the file mixed expired and non-expired hints (which could happen if it had hints to two tables with different gc_grace_seconds), and sending of some non-expired hints failed, then positions of expired hints could influence calculation _last_not_complete_rp, and more hints than necessary would be resent on the next retry. This simple patch removes commitlog position of a hint from rps_set when it is detected to be too old. Fixes #6422	2020-05-11 11:33:31 +02:00
Asias He	71d0d58f8c	Revert "config: Do not enable repair based node operations by default" This reverts commit `b8ac10c451`. The repair based node operations will be enabled by default in 4.1. Revert the patch which disables it by default.	2020-05-07 13:17:35 +03:00
Nadav Har'El	0214f0ad60	main: really enable the "--start-native-transport" option In commit `da3bf20e71` we supposedly enabled support for Cassandra's "start_native_transport" option which can be set to 0 to run Scylla without listening on the CQL port. This can be useful, for example, if a user only want the DynamoDB or Redis APIs but not CQL. Unfortunately, the option was still marked "Unused", so it wasn't really enabled as a valid command line option. This patch fixes that, and documents the start_native_transport option in docs/protocols.md, where we document the different protocols, ports, and options to configure them. Fixes #6387. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200506174850.13616-1-nyh@scylladb.com>	2020-05-07 11:09:18 +03:00
Piotr Sarna	f48e414eab	db, view: remove duplicate entries from pending endpoints When generating view updates, an endpoint can appear both as a primary paired endpoint for the view update, and as a pending endpoint (due to range movements). In order not to generate the same update twice for the same endpoint, the paired endpoint is removed from the list of pending endpoints if present. Fixes #5459 Tests: unit(dev), dtest(TestMaterializedViews.add_dc_during_mv_insert_test)	2020-05-06 16:42:56 +03:00
Piotr Sarna	bf5f247bc5	db: set gc grace period to 0 for local system tables Local system tables from `system` namespace use LocalStrategy replication, so they do not need to be concerned about gc grace period. Some system tables already set gc grace period to 0, but other ones, including system.large_partitions, did not. That may result in millions of tombstones being needlessly kept for these tables, which can cause read timeouts. Fixes #6325 Tests: unit(dev), local(running cqlsh and playing with system tables)	2020-05-03 17:41:50 +03:00
Asias He	b8ac10c451	config: Do not enable repair based node operations by default Give it some more time to mature. Use the old stream plan based node operations by default. Fixes: #6305 Backports: 4.0	2020-04-30 12:37:24 +03:00
Tomasz Grabiec	c59ec8d97f	Merge "Avoid some memory copies in lwt" from Gleb * seastar-dev.git gleb/lwt-shared-proposal: lwt: pass paxos::proposal as a shared pointer everywhere lwt: do not copy proposal in paxos_state::accept lwt: make load_paxos_state to take partition_key_view instead of a deference	2020-04-22 13:43:03 +02:00
Gleb Natapov	97af6bb0bd	lwt: make load_paxos_state to take partition_key_view instead of a deference Some caller have partition_key_view, but not partition_key, so thy need to create a temporary and copy just to pass a reference. Change it by accepting a view.	2020-04-22 13:51:43 +03:00
Calle Wilund	525b283326	commitlog::read_log_file: Preserve subscription across reading Fixes #6265 Return type for read_log_file was previously changed from subscription to future<>, returning the previously returned subscriptions result of done(). But it did not preserve the subscription itself, which in turn will cause us to (in work::stream), call back into a deleted object. Message-Id: <20200422090856.5218-1-calle@scylladb.com>	2020-04-22 12:12:11 +03:00
Glauber Costa	1f9c37fb5e	view_updating_consumer: move reference to a pointer It is currently not possible to wrap the view_updating_consumer in an std::optional. I intend to do it to allow for compactions to optionally generate view updates. The reason for that is that view_updating_consumer has a reference as a member, which makes the move assignment constructor not be implicitly generated. This patch fixes it by keeping a pointer instead of a reference. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200421123648.8328-1-glauber@scylladb.com>	2020-04-22 10:05:35 +03:00
Piotr Sarna	03f41b9d96	db: remove trailing whitespace Found when backporting a patch to 3.3. Message-Id: <fa406597deaacff56dbba99fa167715b041bbb52.1587375123.git.sarna@scylladb.com>	2020-04-20 12:58:55 +02:00
Tomasz Grabiec	e648e314e5	Merge "Drop only learnt value on PRUNE" from Gleb It is unsafe to remove entire row, so only drop learn value from system.paxos table. Fixes: #6154	2020-04-20 12:06:04 +02:00
Gleb Natapov	73391420fb	lwt: drop only most recently learnt value during prune. It turned out we cannot drop the information about most recent commit entirely since it is used to cut off already outdate accepted values. Otherwise the following scenario can happen: 1. cas1 prepares on A, B, C, gets one accept from A 2. cas2 prepares on B, C, gets 2 accepts on B and C, learns on B, C 3. cas3 initiates a prepare on A, learns about cas1's accept, 4. cas2 learns on A, prunes on A, B, C Now cas3 will reply cas1's value because it does not know that it is less than already committed on (removed during step 4). The patch drops only committed value and keep the information about latest committed ballot. Fixed #6154	2020-04-19 17:12:15 +03:00
Gleb Natapov	d3d31d66d4	lwt: treated accepted ballot as a promised PAXOS node is allowed to accept a proposal without promising it first as long as its ballot is greater than already promised one. Treat such accepted ballot as promised since 'learn' stage removes accepted ballot, but we still want to remember it as the latest promised one. The goal is to be closer to formal PAXOS specification.	2020-04-19 17:12:03 +03:00
Piotr Sarna	9c15604659	treewide: deprecate passing explicit order in schema building In order to avoid confusion with regard to whose responsibility it is to sort the key columns (see #5856), the interface which allows adding columns to the builder with explicit column id is moved to a private function. An internal with_column_ordered() overload is maintained to be used for internal operations, but it's encouraged to use simpler with_column() in new code. Fixes #6235 Tests: unit(dev)	2020-04-19 16:19:17 +03:00
Avi Kivity	2039b79664	commitlog: filter out files in the commitlog directory which don't have the correct prefix Commitlog replay is given a filename prefix to filter files against, but it ignores it. As a result we will replay anything in that directory, including recycled segments, which is wasteful. Fix by adding a check for the prefix. Tests: unit (dev), manual test that regular commitlog files are not filtered. Message-Id: <20200416174542.133230-1-avi@scylladb.com>	2020-04-17 08:44:32 +03:00
Vlad Zolotarov	b83e84b467	db::hints:: optimize with_file_update_mutex() Avoid extra shared_ptr copy. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200311214313.2988-1-vladz@scylladb.com>	2020-04-16 09:01:40 +03:00
Piotr Sarna	71ac6ebcc5	Merge 'prepare the view building generator to work through a compaction' from Glauber There is no reason to read a single SSTable at a time from the staging directory. Moving SSTables from staging directory essentially involves scanning input SSTables and creating new SSTables (albeit in a different directory). We have a mechanism that does that: compactions. In a follow up patch, I will introduce a new specialization of compaction that moves SSTables from staging (potentially compacting them if there are plenty). In preparation for that, some signatures have to be changed and the view_updating_consumer has to be more compaction friendly. Meaning: - Operating with an sstable vector - taking a table reference, not a database Because this code is a bit fragile and the reviewer set is fundamentally different from anything compaction related, I am sending this separately * glommer-view_build: staging: potentially read many SSTables at the same time view_build_test: make sure it works with smp > 1	2020-04-15 18:07:09 +02:00
Glauber Costa	4e6400293e	staging: potentially read many SSTables at the same time There is no reason to read a single SSTable at a time from the staging directory. Moving SSTables from staging directory essentially involves scanning input SSTables and creating new SSTables (albeit in a different directory). We have a mechanism that does that: compactions. In a follow up patch, I will introduce a new specialization of compaction that moves SSTables from staging (potentially compacting them if there are plenty). In preparation for that, some signatures have to be changed and the view_updating_consumer has to be more compaction friendly. Meaning: - Operating with an sstable vector - taking a table reference, not a database Because this code is a bit fragile and the reviewer set is fundamentally different from anything compaction related, I am sending this separately Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-04-15 11:26:44 -04:00
Konstantin Osipov	18b9bb57ac	lwt: rename metrics to match accepted terminology Rename inherited metrics cas_propose and cas_commit to cas_accept and cas_learn respectively. A while ago we made a decision to stick to widely accepted terms for Paxos rounds: prepare, accept, learn. The rest of the code is using these terms, so rename the metrics to avoid confusion/technical debt. While at it, rename a few internal methods and functions. Fixes #6169 Message-Id: <20200414213537.129547-1-kostja@scylladb.com>	2020-04-15 12:20:30 +02:00
Piotr Jastrzebski	20bc93b941	cdc: Stop storing CDC options in scylla tables Initially we were storing CDC options in scylla tables but then we realized that we can use schema extensions. Extensions are more flexible and cause less problems with schema digest. The transition was done in 4.0 and with that we stopped reading 'cdc' column in scylla tables. Commit `861c7b5626` removed the code that used to read 'cdc' column. Since no Scylla node should be reading 'cdc' column, we can always keep it empty now. This will allow removal of schema::cdc_options in the future. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-04-15 06:56:44 +02:00
Benny Halevy	35892e4557	db::commitlog: close file if wrapping failed When I/O error (e.g. EMFILE / ENOSPC) happens we hit an assert in ~append_challenged_posix_file_impl(): Assertion _closing_state == state::closed' failed. Commit `6160b9017d` add close on failure of the lamda defined in allocate_segment_ex, but it doesn't handle an error after the file is opened/created while it is wrapped with commitlog_file_extensions. Refs #5657 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Reviewed-by: Calle Wilund <calle@scylladb.com> Message-Id: <20200414115231.298632-1-bhalevy@scylladb.com>	2020-04-14 16:14:28 +03:00
Nadav Har'El	c1a7a071ea	merge: Remove most inclusions of reactor.hh Merged patch series from Avi Kivity: This patchset removes most inclusions of reactor.hh, by switching to new namespace-scoped API:s instead of those using engine() as a way to get the reactor. With this, we are down to 12 translation units depending on reactor.hh, mostly for deprecated API:s like reactor::at_exit(). Avi Kivity (3): logalloc: use namespace-scope seastar::idle_cpu_handler and related rather than reactor scope test: sstable-utils: deinline do_make_keys() treewide: replace calls to engine().some_api() with some_api() configure.py \| 14 +++----- auth/common.hh \| 3 +- checked-file-impl.hh \| 4 +-- db/system_keyspace_view_types.hh \| 2 +- flat_mutation_reader.hh \| 1 + lister.hh \| 2 +- message/messaging_service.hh \| 2 +- redis/server.hh \| 2 +- sstables/compress.hh \| 2 +- sstables/integrity_checked_file_impl.hh \| 2 +- test/lib/sstable_utils.hh \| 35 ++++--------------- test/lib/test_services.hh \| 2 +- thrift/server.hh \| 2 +- transport/server.hh \| 2 +- utils/error_injection.hh \| 3 +- utils/joinpoint.hh \| 2 +- utils/loading_cache.hh \| 2 +- utils/logalloc.hh \| 6 ++-- utils/rate_limiter.hh \| 2 +- api/system.cc \| 1 + auth/default_authorizer.cc \| 2 +- auth/password_authenticator.cc \| 2 +- database.cc \| 1 + db/commitlog/commitlog.cc \| 4 +-- db/hints/resource_manager.cc \| 3 +- db/system_distributed_keyspace.cc \| 2 +- dht/i_partitioner.cc \| 2 +- gms/feature_service.cc \| 3 +- lister.cc \| 4 +-- locator/ec2_snitch.cc \| 3 +- locator/gce_snitch.cc \| 1 + main.cc \| 1 + reader_concurrency_semaphore.cc \| 2 +- redis/server.cc \| 4 +-- sstables/sstables.cc \| 11 +++--- table.cc \| 3 +- test/boost/commitlog_test.cc \| 2 +- test/boost/database_test.cc \| 2 +- test/boost/flush_queue_test.cc \| 2 +- test/boost/gossip_test.cc \| 2 +- .../gossiping_property_file_snitch_test.cc \| 1 + test/boost/loading_cache_test.cc \| 2 +- test/boost/sstable_3_x_test.cc \| 1 + test/boost/sstable_datafile_test.cc \| 1 + test/boost/sstable_test.cc \| 1 + test/lib/sstable_utils.cc \| 26 ++++++++++++++ test/manual/gossip.cc \| 2 +- test/manual/hint_test.cc \| 2 +- test/manual/sstable_scan_footprint_test.cc \| 2 +- test/perf/perf_mutation.cc \| 1 + test/perf/perf_row_cache_update.cc \| 1 + test/perf/perf_sstable.cc \| 1 + test/tools/cql_repl.cc \| 2 +- thrift/server.cc \| 2 +- transport/server.cc \| 4 +-- utils/config_file.cc \| 3 +- utils/file_lock.cc \| 2 +- utils/logalloc.cc \| 14 ++++---- utils/updateable_value.cc \| 2 +- 59 files changed, 119 insertions(+), 98 deletions(-)	2020-04-05 13:47:39 +03:00
Avi Kivity	88ade3110f	treewide: replace calls to engine().some_api() with some_api() This removes the need to include reactor.hh, a source of compile time bloat. In some places, the call is qualified with seastar:: in order to resolve ambiguities with a local name. Includes are adjusted to make everything compile. We end up having 14 translation units including reactor.hh, primarily for deprecated things like reactor::at_exit(). Ref #1	2020-04-05 12:46:04 +03:00
Avi Kivity	1799cfa88a	logalloc: use namespace-scope seastar::idle_cpu_handler and related rather than reactor scope This allows us to drop a #include <reactor.hh>, reducing compile time. Several translation units that lost access to required declarations are updated with the required includes (this can be an include of reactor.hh itself, in case the translation unit that lost it got it indirectly via logalloc.hh) Ref #1.	2020-04-05 12:45:08 +03:00
Piotr Sarna	1a9083b342	db,view: guard view builder startup with a semaphore The startup routine performs some bookkeeping operations on views, and so do these events: - on_create_view; - on_drop_view; - on_update_view. Since the above events are guarded with a semaphore, the startup routine should also take the same semaphore - in order to ensure that all bookkeeping operations are serialized. Refs #6094	2020-04-05 11:41:26 +02:00
Piotr Sarna	8da4a5b78c	db,view: nitpick: change & operator to && for booleans Although it's technically correct to use the bitwise and operator on booleans as well, it's slightly confusing for the reader.	2020-04-05 11:41:25 +02:00
Piotr Sarna	e49805b7b8	db,view: remove unneeded implicit capture-by-reference The lambda does not use any other captures, so it does not to implicitly capture anything by reference.	2020-04-05 11:41:25 +02:00
Piotr Sarna	3f19865493	db,view: fix waiting for a view building future The future was marked with a `FIXME: discarded future`, but there's really no reason not to wait for it, and it was probably meant to be waited for since its implementation.	2020-04-05 11:41:25 +02:00
Botond Dénes	240b5e0594	frozen_schema: key() remove unused schema parameter Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200402092249.680210-1-bdenes@scylladb.com>	2020-04-02 14:43:35 +02:00
Konstantin Osipov	9948f548a5	lwt: remove Paxos from experimental list Always enable lightweight transactions. Remove the check for the command line switch from the feature service, assuming LWT is always enabled. Remove the check for LWT from Alternator. Note that in order for the cluster to work with LWT, all nodes need to support it. Rename LWT to UNUSED in db/config.hh, to keep accepting lwt keyword in --experimental-features command line option, but do nothing with it. Changes in v2: * remove enable_lwt feature flag, it's always there Closes #6102 test: unit (dev, debug) Message-Id: <20200401071149.41921-1-kostja@scylladb.com>	2020-04-01 09:12:21 +02:00
Avi Kivity	dee0b68347	Merge 'Separate sharding and partitioning logic' from Piotr J " Currently, both sharding and partitioning logic is encapsulated into partitioners. This is not desirable because these two concepts are totally independent and shouldn't be coupled together in such a way. This PR separates sharding and partitioning. Partitioning will still live in i_partitioner class and its subclasses. Sharding is extracted to a new class called sharding_info. Both partitioners and sharding_info are still managed by schema class. Partitioner can be accessed with schema::get_partitioner while sharding_info can be accessed with schema::get_sharding_info. The transition is done in steps: 1. sharding_info class is defined and all the sharding logic is extracted from partitioner to the new class. Temporarily sharding_info is still embedded into i_partitioner and all sharding related functions in i_partitioner call delegate to the embedded sharding_info object. 2. All calls to i_partitioner functions that are related to sharding are gradually switched to calls to sharding_info equivalents. sharding_info. 3. Once everything uses sharding_info, all sharding logic is dropped from i_partitioner. Tests: unit(dev, release) " * haaawk-sharding_info: (32 commits) dummy_sharder: rename dummy_sharding_info.* to dummy_sharder.* sharding_info: rename the class to sharder i_partitioner:remove embeded sharding_info i_partitioner: remove unused get_sharding_info schema: remove incorrect comment schema: make it possible to set sharding_info per schema i_partitioner: remove unused shard_count multishard_writer: stop calling i_partitioner::shard_count i_partitioner: remove sharding_ignore_msb partitioner_test: test ranges and sharding_infos i_partitioner: remove unused split_ranges_to_shards i_partitioner: remove unused shard_of function sstable-utils: use sharding_info::shard_of create_token_range_from_keys: use sharding info for shard_of multishard_mutation_query_test: use sharding info for shard_of distribute_reader_and_consume_on_shards: use sharding_info::shard_of multishard_mutation_query: use sharding_info::shard_of dht::shard_of: use schema::get_sharding_info i_partitioner: remove unused token_for_next_shard split_range_to_single_shard: use sharding info instead of partitioner ...	2020-03-31 13:40:51 +03:00
Gleb Natapov	8a408ac5a8	lwt: remove entries from system.paxos table after successful learn stage The learning stage of PAXOS protocol leaves behind an entry in system.paxos table with the last learned value (which can be large). In case not all participants learned it successfully next round on the same key may complete the learning using this info. But if all nodes learned the value the entry does not serve useful purpose any longer. The patch adds another round, "prune", which is executed in background (limited to 1000 simultaneous instances) and removes the entry in case all nodes replied successfully to the "learn" round. It uses the ballot's timestamp to do the deletion, so not to interfere with the next round. Since deletion happens very close to previous writes it will likely happen in memtable and will never reach sstable, so that reduces memtable flush and compaction overhead. Fixes #5779 Message-Id: <20200330154853.GA31074@scylladb.com>	2020-03-30 21:02:14 +03:00
Piotr Jastrzebski	e72696a8e6	sharding_info: rename the class to sharder Also rename all variables that were named si or sinfo to sharder. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	2e850421a0	i_partitioner:remove embeded sharding_info sharding_info embeded into partitioner is no longer used anywhere and can be removed. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	7bd2b8d73f	schema: make it possible to set sharding_info per schema Previously schema::get_sharding_info was obtaining sharding_info from the partitioner but we want to remove sharding_info from the partitioner so we need a place in schema to store it there instead. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Gleb Natapov	b3db6f5b04	lwt: rename "in_progress_ballot" cell to "promise" in system.paxos table The value that is stored in "in_progress_ballot" cell is the value of promised ballot, so call the cell accordingly to avoid confusion especially as we have a notion of "in progress" proposal in the code which is not the same as in_progress_ballot here. We can still do it without care about backwards compatibility since LWT is still marked as experimental. Fixes #6087. Message-Id: <20200326095758.GA10219@scylladb.com>	2020-03-30 12:01:55 +03:00
Asias He	743b529c2b	gossip: Add an option to force gossip generation Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation number g1, g2, g3. n1, n2, n3 running scylla version with commit `0a52ecb6df` (gossip: Fix max generation drift measure) One year later, user wants the upgrade n1,n2,n3 to a new version when n3 does a rolling restart with a new version, n3 will use a generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's gossip update and mark g3 as down. Such unnecessary marking of node down can cause availability issues. For example: DC1: n1, n2 DC2: n3, n4 When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which causes the whole DC2 to be unavailable. To fix, we can start the node with a gossip generation within MAX_GENERATION_DIFFERENCE difference for the new node. Once all the nodes run the version with commit `0a52ecb6df`, the option is no logger needed. Fixes #5164	2020-03-27 12:15:21 +01:00
Rafael Ávila de Espíndola	c5795e8199	everywhere: Replace engine().cpu_id() with this_shard_id() This is a bit simpler and might allow removing a few includes of reactor.hh. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200326194656.74041-1-espindola@scylladb.com>	2020-03-27 11:40:03 +03:00
Rafael Ávila de Espíndola	eca0ac5772	everywhere: Update for deprecated apply functions Now apply is only for tuples, for varargs use invoke. This depends on the seastar changes adding invoke. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200324163809.93648-1-espindola@scylladb.com>	2020-03-25 08:49:53 +02:00
Avi Kivity	a314283469	Merge "Minor cleanups to cql3 code regarding shared_ptr's" from Pavel S " This small series consists of several changes that aim to reduce the number of shared_ptr's in cql3 code. Also it contains a patch that makes CqlParser::query to return std::unique_ptr<> instead of seastar::shared_ptr<>, which leads to more understandable code and lays foundation for further optimizations (e.g. possibly eliminating shared_ptr's in `prepared_statement` and just moving raw statements in `prepare` without copying them). Tests: unit(dev, debug) " * 'feature/cql_cleanups_9' of https://github.com/ManManson/scylla: cql3: return raw::parsed_statement as unique_ptr cql3: de-pointerize arguments to some of CQL grammar rules and definitions. cql3: make abstract_marker::make_in_receiver accept cref to column_specification	2020-03-24 14:51:49 +02:00
Calle Wilund	9fee712d62	db::commitlog: Don't write trailing zero block unless needed Fixes #5899 When terminating (closing) a segment, we write a trailing block of zero so reader can have an empty region after last used chunk as end marker. This is due to using recycled, pre-allocated segments with potentially non-zero data extending over the point where we are ending the segment (i.e. we are not fully filling the segment due to a huge mutation or similar). However, if we reach end of segment writing the final block (typically many small mutations), the file will end naturally after the data written, and any trailing zero block would in fact just extend the file further. While this will only happen once per segment recycled (independent on how many times it is recycled), it is still both slightly breaking the disk usage contract and also potentially causing some disk stalls due to metadata changes (though of course very infrequent). We should only write trailing zero if we are below the max_size file size when terminating Adds a small size check to commitlog test to verify size bounds. (Which breaks without the patch) v2: - Fix test to take into account that files might be deleted behind our backs. v3: - Fix test better, by doing verification _before_ segments are queued for delete. Message-Id: <20200226121601.15347-2-calle@scylladb.com> Message-Id: <20200324100235.23982-1-calle@scylladb.com>	2020-03-24 11:31:55 +01:00
Pavel Solodovnikov	adc6a98b59	cql3: return raw::parsed_statement as unique_ptr Change CQL parsing routine to return std::unique_ptr instead of seastar::shared_ptr. This can help reduce redundant shared_ptr copies even further. Make some supplementary changes necessary for this transition: * Remove enabled_shared_from_this base class from the following classes: truncate_statement, authorization_statement, authentication_statement: these were previously constructing prepared_statement instance in `prepare` method using `shared_from_this`. Make `prepare` methods implementation of inheriting classes mirror implementation from other statements (i.e. create a shallow copy of the object when prepairing into `prepared_statement`; this could be further refactored to avoid copies as much as possible). * Remove unused fields in create_role_statement which led to error while using compiler-generated copy ctor (copying uninitialied bool values via ctor). Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-03-23 23:19:21 +03:00
Botond Dénes	e0284bb9ee	treewide: add missing headers and/or forward declarations	2020-03-23 09:29:45 +02:00
Pekka Enberg	6b2cd1bd7d	Revert "db::commitlog: Don't write trailing zero block unless needed" This reverts commit `0b34d88957`. According to Rafael Avila de Espindola: "I have bisected the recent failures [in commitlog_test] on next to this patch."	2020-03-20 22:30:58 +02:00
Nadav Har'El	7922b9eb8f	materialized views: reduce recompilation when db/view/view.hh changes. Before this patch, when db/view/view.hh was modified, 89 source files had to be recompiled. After this patch, this number is down to 5. Most of the irrelevant source files got view.hh by including database.hh, which included view.hh just for the definition of statistics. So in this patch we split the view statistics to a separate header file, view_stats.hh, and database.hh only includes that. A few source files which included only database.hh and also needed view.hh (for materialized-view related functions) now need to include view.hh explicitly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200319121031.540-1-nyh@scylladb.com>	2020-03-19 15:46:14 +02:00
Piotr Sarna	0c11e07faf	view,table: fix waiting for view updates during building View updates sent as part of the view building process should never be ignored, but `fd49fd7` introduced a bug which may cause exactly that: the updates are mistakenly sent to background, so the view builder will not receive negative feedback if an update failed, which will in turn not cause a retry. Consequently, view building may report that it "finished" building a view, while some of the updates were lost. A simple fix is to restore previous behaviour - all updates triggered by view building are now waited for. Fixes #6038 Tests: unit(dev), dtest: interrupt_build_process_with_resharding_low_to_half_test	2020-03-19 10:50:54 +02:00

1 2 3 4 5 ...

1680 Commits