scylladb

Author	SHA1	Message	Date
Benny Halevy	e88871f4ec	replica: database: move shard_of implementation to mutation layer We don't need the database to determine the shard of the mutation, only its schema. So move the implementation to the respecive definitions of mutation and frozen_mutation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #10430	2022-04-27 14:40:24 +03:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Avi Kivity	ae3a360725	database: Move database, keyspace, table classes to replica/ directory The database, keyspace, and table classes represent the replica-only part of the objects after which they are named. Reading from a table doesn't give you the full data, just the replica's view, and it is not consistent since reconciliation is applied on the coordinator. As a first step in acknowledging this, move the related files to a replica/ subdirectory.	2022-01-06 17:07:30 +02:00
Pavel Solodovnikov	76bea23174	treewide: reduce header interdependencies Use forward declarations wherever possible. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Closes #8813	2021-06-07 15:58:35 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Avi Kivity	1986a74cc4	db: commitlog_replayer: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:24:01 +03:00
Piotr Jastrzebski	01ea159fde	codebase wide: use try_emplace when appropriate C++17 introduced try_emplace for maps to replace a pattern: if(element not in a map) { map.emplace(...) } try_emplace is more efficient and results in a more concise code. This commit introduces usage of try_emplace when it's appropriate. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <4970091ed770e233884633bf6d46111369e7d2dd.1597327358.git.piotr@scylladb.com>	2020-08-16 14:41:09 +03:00
Piotr Jastrzebski	c001374636	codebase wide: replace count with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously `count` function was often used in various ways. `contains` does not only express the intend of the code better but also does it in more unified way. This commit replaces all the occurences of the `count` with the `contains`. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>	2020-08-15 20:26:02 +03:00
Tomasz Grabiec	3486eba1ce	commitlog: Fix use-after-free on mutation object during replay The mutation object may be freed prematurely during commitlog replay in the schema upgrading path. We will hit the problem if the memtable is full and apply_in_memory() needs to defer. This will typically manifest as a segfault. Fixes #6953 Introduced in `79935df` Tests: - manual using scylla binary. Reproduced the problem then verified the fix makes it go away Message-Id: <1596044010-27296-1-git-send-email-tgrabiec@scylladb.com>	2020-07-29 20:58:15 +03:00
Botond Dénes	6083ed668b	commitlog_replayer: ignore entries with invalid keys When replaying the commitlog, pass keys to `validation::validate_cql_key()`. Discard entries which fail validation and warn about it in the logs. This prevents invalid keys from getting into the system, possibly failing the commitlog replay and the successful boot of the node, preventing the node from recovering data.	2020-05-12 12:07:21 +03:00
Rafael Ávila de Espíndola	e4b8f52237	commitlog: Simplify the return of read_log_file This function really just wants to signal it is done, so return a future<>. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200128172847.31513-1-espindola@scylladb.com>	2020-01-30 12:00:29 +02:00
Calle Wilund	56a5e0a251	commitlog_replayer: Ensure applied frozen_mutation is safe during apply Fixes #5211 In `79935df959` replay apply-call was changed from one with no continuation to one with. But the frozen mutation arg was still just lambda local. Change to use do_with for this case as well. Message-Id: <20191203162606.1664-1-calle@scylladb.com>	2019-12-03 18:28:01 +02:00
Avi Kivity	623071020e	commitlog: change variadic stream in read_log_file to future<struct> Since seastar::streams are based on future/promise, variadic streams suffer the same fate as variadic futures - deprecation and eventual removal. This patch therefore replaces a variadic stream in commitlog::read_log_file() with a non-variadic stream, via a helper struct. Tests: unit (dev)	2019-10-29 19:25:12 +01:00
Tomasz Grabiec	79935df959	commitlog: replay: Respect back-pressure from memtable space to prevent OOM Commit log replay was bypassing memtable space back-pressure, and if replay was faster than memtable flush, it could lead to OOM. The fix is to call database::apply_in_memory() instead of table::apply(). The former blocks when memtable space is full. Fixes #4982. Tests: - unit (release) - manual, replay with memtable flush failin and without failing Message-Id: <1568381952-26256-1-git-send-email-tgrabiec@scylladb.com>	2019-09-15 11:51:56 +03:00
Calle Wilund	9cadbaa96f	commitlog_replayer: Bugfix: finding truncation positions uses local var ref "uuid" was ref:ed in a continuation. Works 99.9% of the time because the continuation is not actually delayed (and assuming we begin the checks with non-truncated (system) cf:s it works). But if we do delay continuation, the resulting cf map will be borked. Fixes #4187. Message-Id: <20190204141831.3387-1-calle@scylladb.com>	2019-02-04 16:51:13 +02:00
Duarte Nunes	b7517183fa	db/commitlog: Use fragmented buffers to read entries Leverage fragmented_temporary_buffer when reading commit log entries, avoiding large allocations. Refs #4020 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Avi Kivity	f0a709cfc8	commitlog_replayer: don't use query_processor During normal writes, query processing happens before commitlog, so logically commitlog replaying the commitlog shouldn't need it. And in fact the dependency on query_processor can be eliminated, all it needs is the local node's database.	2018-12-29 11:00:29 +02:00
Avi Kivity	cc8312a8b9	commitlog: reduce dependencies on db/config.hh Instead of accessing extensions via config, access it via database::extensions(). This reduces recompilations when configuration is extended.	2018-12-21 20:15:43 +00:00
Tomasz Grabiec	538e041f22	Merge "Remove some dependencies on db::config" from Avi db::config is a global class; changes in any module can cause changes in db::config. Therefore, it is a cause of needless recompilation. Remove some of these dependencies by having consumers of db::config declare an intermediate config struct that is contains only configuration of interest to them, and have their caller fill it out (in the case of auth, it already followed this scheme and the patchset only moves the translation function). In addition, some outright pointless inclusions of db/config.hh are removed. The result is somewhat shorter compile times, and fewer needless recompiles. * https://github.com/avikivity/scylla unconfig-1/v1: config: remove inclusions of db/config.hh from header files repair: remove unneeded config.hh inclusion batchlog_manager: remove dependency on db::config auth: remove permissions_cache dependency on db::config auth: remove auth::service dependency on db::config auth: remove unneeded db/config.hh includes	2018-12-10 14:53:14 +01:00
Calle Wilund	b35af84599	commitlog_replay: Enforce file name based id matching When reading the header chunk of a commitlog file, check the stored id value against the id derived from the file name, and ignore if mismatched. This is a prerequisite for re-using renamed commitlog files, as we can then fail-fast should one such be left on disk, instead of trying to replay it. We also check said id via the CRC check for each chunk parsed. If we find a chunk with mismatched id, we will get a CRC error for the chunk, and replay will terminate (albeit not gracefully).	2018-12-10 09:09:07 +00:00
Avi Kivity	864f55e745	config: remove inclusions of db/config.hh from header files Instead, distribute those inclusions to .cc files that require them. This reduces rebuilds when config.hh changes, and makes it easier to locate files that need config disaggregation.	2018-12-09 20:11:38 +02:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Avi Kivity	d77e044cde	db: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Vlad Zolotarov	a89188de07	commitlog::read_log_file(): set the a read I/O priority class explicitly Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-10 15:22:43 -04:00
Calle Wilund	bb1a2c6c2e	db::commitlog: Add commitlog/hints file io extension To allow on-disk data to be augumented.	2018-03-26 11:58:27 +00:00
José Guilherme Vanz	380bc0aa0d	Swap arguments order of mutation constructor Swap arguments in the mutation constructor keeping the same standard from the constructor variants. Refs #3084 Signed-off-by: José Guilherme Vanz <guilherme.sft@gmail.com> Message-Id: <20180120000154.3823-1-guilherme.sft@gmail.com>	2018-01-21 12:58:42 +02:00
Vlad Zolotarov	878d58d23a	db/commitlog/commitlog::descriptor: add a filename_prefix parameter This parameter is used when creating a new segment. It's default value is a descriptor::FILENAME_PREFIX. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-14 15:05:47 -05:00
Calle Wilund	d9b8c79eb9	commitlog_replayer: Ignore sstable replay positions With relaxed position ordering, we cannot use existing sstables as water mark for replay. We must replay everything above truncation marks.	2017-06-07 12:07:01 +00:00
Avi Kivity	ebaeefa02b	Merge seatar upstream (seastar namespace) - introcduced "seastarx.hh" header, which does a "using namespace seastar"; - 'net' namespace conflicts with seastar::net, renamed to 'netw'. - 'transport' namespace conflicts with seastar::transport, renamed to cql_transport. - "logger" global variables now conflict with logger global type, renamed to xlogger. - other minor changes	2017-05-21 12:26:15 +03:00
Calle Wilund	b12b65db92	commitlog/replayer: Bugfix: minimum rp broken, and cl reader offset too The previous fix removed the additional insertion of "min rp" per source shard based on whether we had processed existing CF:s or not (i.e. if a CF does not exist as sstable at all, we must tag it as zero-rp, and make whole shard for it start at same zero. This is bad in itself, because it can cause data loss. It does not cause crashing however. But it did uncover another, old old lingering bug, namely the commitlog reader initiating its stream wrongly when reading from an actual offset (i.e. not processing the whole file). We opened the file stream from the file offset, then tried to read the file header and magic number from there -> boom, error. Also, rp-to-file mapping was potentially suboptimal due to using bucket iterator instead of actual range. I.e. three fixes: * Reinstate min position guarding for unencoutered CF:s * Fix stream creating in CL reader * Fix segment map iterator use. v2: * Fix typo Message-Id: <1490611637-12220-1-git-send-email-calle@scylladb.com>	2017-03-28 10:32:28 +02:00
Calle Wilund	c3a510a08d	commitlog_replayer: Do proper const-loopup of min positions for shards Fixes #2173 Per-shard min positions can be unset if we never collected any sstable/truncation info for it, yet replay segments of that id. Wrap the lookups to handle "missing data -> default", which should have been there in the first place. Message-Id: <1490185101-12482-1-git-send-email-calle@scylladb.com>	2017-03-22 17:57:09 +02:00
Calle Wilund	078589c508	commitlog_replayer: Make replay parallel per shard Fixes #2098 Replay previously did all segments in parallel on shard 0, which caused heavy memory load. To reduce this and spread footprint across shards, instead do X segments per shard, sequential per shard. v2: * Fixed whitespace errors Message-Id: <1489503382-830-1-git-send-email-calle@scylladb.com>	2017-03-15 13:07:17 +02:00
Tomasz Grabiec	059a1a4f22	db: Fix commitlog replay to not drop cell mutations with older schema column_mapping is not safe to access across shards, because data_type is not safe to access. One of the manifestation of this is that abstract_type::is_value_compatible_with() always fails if the two types belong to different shards. During replay, column_mapping lives on the replaying shard, and is used by converting_mutation_partition_applier against the schema on the target shard. Since types in the mapping will be considered incompatible with types in the schema, all cells will be dropped. Fix by using column_mapping in a safe way, by copying it to the target shard if necessary. Each shard maintains its own cache of column mappings. Fixes #1924. Message-Id: <1481310463-13868-1-git-send-email-tgrabiec@scylladb.com>	2016-12-13 12:19:32 +02:00
Tomasz Grabiec	c1a7e2090e	Revert "database: change find_column_families signature so it returns a lw_shared_ptr" This reverts commit `f3528ede65`.	2016-11-04 10:48:21 +01:00
Glauber Costa	f3528ede65	database: change find_column_families signature so it returns a lw_shared_ptr There are places in which we need to use the column family object many times, with deferring points in between. Because the column family may have been destroyed in the deferring point, we need to go and find it again. If we use lw_shared_ptr, however, we'll be able to at least guarantee that the object will be alive. Some users will still need to check, if they want to guarantee that the column family wasn't removed. But others that only need to make sure we don't access an invalid object will be able to avoid the cost of re-finding it just fine. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <722bf49e158da77ff509372c2034e5707706e5bf.1478111467.git.glauber@scylladb.com>	2016-11-03 13:27:31 +01:00
Avi Kivity	2a46410f4a	Change sstable_list from a map to a set sstable_list is now a map<generation, sstable>; change it to a set in preparation for replacing it with sstable_set. The change simplifies a lot of code; the only casualty is the code that computes the highest generation number.	2016-07-03 10:26:57 +03:00
Calle Wilund	2b812a392a	commitlog_replayer: Fix calculation of global min pos per shard If a CF does not have any sstables at all, we should treat it as having a replay position of zero. However, since we also must deal with potential re-sharding, we cannot just set shard->uuid->zero initially, because we don't know what shards existed. Go through all CF:s post map-reduce, and for every shard where a CF does not have an RP-mapping (no sstables found), set the global min pos (for shard) to zero. Fixes #1372 Message-Id: <1465991864-4211-1-git-send-email-calle@scylladb.com>	2016-06-21 10:05:05 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Paweł Dziepak	bdc23ae5b5	remove db/serializer.hh includes Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-02 09:07:09 +00:00
Pekka Enberg	86173fb8cc	db/commitlog: Fix debug log format string in commitlog_replayer::recover() I saw the following Boost format string related warning during commitlog replay: INFO [shard 0] commitlog_replayer - Replaying node3/commitlog/CommitLog-1-72057594289748293.log, node3/commitlog/CommitLog-1-90071992799230277.log, node3/commitlog/CommitLog-1-108086391308712261.log, node3/commitlog/CommitLog-1-251820357.log, node3/commitlog/CommitLog-1-54043195780266309.log, node3/commitlog/CommitLog-1-36028797270784325.log, node3/commitlog/CommitLog-1-126100789818194245.log, node3/commitlog/CommitLog-1-18014398761302341.log, node3/commitlog/CommitLog-1-126100789818194246.log, node3/commitlog/CommitLog-1-251820358.log, node3/commitlog/CommitLog-1-18014398761302342.log, node3/commitlog/CommitLog-1-36028797270784326.log, node3/commitlog/CommitLog-1-54043195780266310.log, node3/commitlog/CommitLog-1-72057594289748294.log, node3/commitlog/CommitLog-1-90071992799230278.log, node3/commitlog/CommitLog-1-108086391308712262.log WARN [shard 0] commitlog_replayer - error replaying: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::io::too_many_args> > (boost::too_many_args: format-string referred to less arguments than were passed) While inspecting the code, I noticed that one of the error loggers is missing an argument. As I don't know how the original failure triggered, I wasn't able to verify that that was the only one, though. Message-Id: <1453893301-23128-1-git-send-email-penberg@scylladb.com>	2016-01-27 13:40:19 +02:00
Calle Wilund	59bf54d59a	commitlog_replayer: Modify logging to more match origin * Match origin log messages - Demote per-file printouts to "debug" level. * Print an all-files stat summary for whole replay (begin/summary) - At info level, like origin Prompted by dtest that expects origin log output. Message-Id: <1453216558-18359-1-git-send-email-calle@scylladb.com>	2016-01-19 17:19:52 +02:00
Paweł Dziepak	218898b297	commitlog: upgrade mutations during commitlog replay Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-01-13 10:50:26 +01:00
Paweł Dziepak	661849dbc3	commitlog: learn about schema versions during replay Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-01-13 10:50:23 +01:00
Paweł Dziepak	18d0a57bf4	commitlog: use commitlog entry writer and reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-01-13 10:20:06 +01:00
Tomasz Grabiec	036974e19b	Make mutation interfaces support multiple versions Schema is tracked in memtable and cache per-entry. Entries are upgraded lazily on access. Incoming mutations are upgraded to table's current schema on given shard. Mutating nodes need to keep schema_ptr alive in case schema version is requested by target node.	2016-01-11 10:34:51 +01:00
Tomasz Grabiec	c0ac7b3a73	commitlog: Wrap subscription in a unique_ptr<> to make it nothrow movable future<> will require nothrow move constructible types.	2015-12-07 09:50:28 +01:00
Tomasz Grabiec	657841922a	Mark move constructors noexcept when possible	2015-12-07 09:50:27 +01:00
Calle Wilund	76b43fbf74	commitlog_replayer: Handle replay data errors as non-fatal Discern fatal and non-fatal excceptions, and handle data corruption by adding to stats, resporting it, but continue processing. Note that "invalid_arguement", i.e. attempting to replay origin/old segments are still considered fatal, as it is probably better to signal this strongly to user/admin	2015-11-23 15:42:45 +01:00
Calle Wilund	43712a583d	commitlog_replayer: Special case exception from "old/origin file" And write some nice informative stuff.	2015-11-10 17:14:22 +01:00

1 2

60 Commits