scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 12:36:56 +00:00

Author	SHA1	Message	Date
Rafael Ávila de Espíndola	d9a71a7cff	service: Refactor code into a atomic_vector class This templates the code for listener_vector, renames it to atomic_vector and moves it to the utils directory. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 08:16:03 -08:00
Avi Kivity	1f46133273	Merge "data: make cell::make_collection() exception safe" from Botond " Most of the code in `cell` and the `imr` infrastructure it is built on is `noexcept`. This means that extra care must be taken to avoid rouge exceptions as they will bring down the node. The changes introduced by 0a453e5d3a did just that - introduced rouge `std::bad_alloc` into this code path by violating an undocumented and unvalidated assumption -- that fragment ranges passed to `cell::make_collection()` are nothrow copyable and movable. This series refactors `cell::make_collection()` such that it does not have this assumption anymore and is safe to use with any range. Note that the unit test included in this series, that was used to find all the possible exception sources will not be currently run in any of our build modes, due to `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` not being set. I plan to address this in a followup because setting this flags fails other tests using the failure injection mechanism. This is because these tests are normally run with the failure injection disabled so failures managed to lurk in without anyone noticing. Fixes: #5575 Refs: #5341 Tests: unit(dev, debug) " * 'data-cell-make-collection-exception-safety/v2' of https://github.com/denesb/scylla: test: mutation_test: add exception safety test for large collection serialization data/cell.hh: avoid accidental copies of non-nothrow copiable ranges utils/fragment_range.hh: introduce fragment_range_view	2020-01-14 10:01:06 +02:00
Botond Dénes	b52b4d36a2	utils/fragment_range.hh: introduce fragment_range_view A lightweight, trivially copyable and movable view for fragment ranges. Allows for uniform treatment of all kinds of ranges, i.e. treating all of them as a view. Currently `fragment_range.hh` provides lightweight, view-like adaptors for empty and single-fragment ranges (`bytes_view`). To allow code to treat owning multi-fragment ranges the shame way as the former two, we need a view for the latter as well -- this is `fragment_range_view`.	2020-01-13 16:52:59 +02:00
Avi Kivity	454074f284	Merge "database: Avoid OOMing with flush continuations after failed memtable flush" from Tomasz " The original fix (`10f6b125c8`) didn't take into account that if there was a failed memtable flush (Refs flush) but is not a flushable memtable because it's not the latest in the memtable list. If that happens, it means no other memtable is flushable as well, cause otherwise it would be picked due to evictable_occupancy(). Therefore the right action is to not flush anything in this case. Suspected to be observed in #4982. I didn't manage to reproduce after triggering a failed memtable flush. Fixes #3717 " * tag 'avoid-ooming-with-flush-continuations-v2' of github.com:tgrabiec/scylla: database: Avoid OOMing with flush continuations after failed memtable flush lsa: Introduce operator bool() to occupancy_stats lsa: Expose region_impl::evictable_occupancy in the region class	2020-01-08 16:58:54 +02:00
Rafael Ávila de Espíndola	3d641d4062	lua: Use existing cpp_int cast logic Different versions of boost have different rules for what conversions from cpp_int to smaller intergers are allowed. We already had a function that worked with all supported versions, but it was not being use by lua. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200104041028.215153-1-espindola@scylladb.com>	2020-01-05 12:10:54 +02:00
Benny Halevy	4c884908bb	directories: Keep a unique set of directories to initialize If any two directories of data/commitlog/hints/view_hints are the same we still end up running verify_owner_and_mode and disk_sanity(check_direct_io_support) in parallel on the same directoriea and hit #5510. This change uses std::set rather than std::vector to collect a unique set of directories that need initialization. Fixes #5510 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191225160645.2051184-1-bhalevy@scylladb.com>	2019-12-29 16:26:26 +02:00
Pavel Emelyanov	a5cdfea799	directories: Do not mess with per-shard base dir The hints and view_hints directory has per-shard sub-dirs, and the directories code tries to create, check and lock all of them, including the base one. The manipulations in question are excessive -- it's enough to check and lock either the base dir, or all the per-shard ones, but not everything. Let's take the latter approach for its simplicity. Fixes #5510 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Looks-good-to: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191223142429.28448-1-xemul@scylladb.com>	2019-12-24 14:49:28 +02:00
Pavel Emelyanov	23a8d32920	directories: Make internals work on fs::path Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	373fcfdb3e	directories: Cleanup adding dirs to the vector to work on The unordered_set is turned into vector since for fs::path there's no hash() method that's needed for set. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	14437da769	directories: Drop seastar::async usage Now the only future-able operation remained is the call to parallel_for_each(), all the rest is non-blocking preparation, so we can drop the seastar::async and just return the future from parallel_for_each. The indendation is now good, as in previous patch is was prepared just for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	06f4f3e6d8	directories: Do touch_and_lock and verify sequentially The goal is to drop the seastar::async() usage. Currently we have two places that return futures -- calls to parallel_for_each-s. We can either chain them together or, since both are working on the same set of directories, chain actions inside them. For code simplicity I propose to chain actions. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	8d0c820aa1	directories: Do touch_and_lock in parallel The list of paths that should be touch-and-locked is already at hands, this shortens the code and makes it slightly faster (in theory). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	71a528d404	directories: Move the whole stuff into own .cc file In order not to pollute the root dir place the code in utils/ directory, "utils" namespace. While doing this -- move the touch_and_lock from the class declaration. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	82ef2a7730	file_lock: Work with fs::path, not sstring The main.cc code that converts sstring to fs::path will be patched soon, the file_desc::open belongs to seastar and works on sstrings. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 17:32:10 +03:00
Dejan Mircevski	a26bd9b847	utils: Add enum_option This allows us to accept command-line options with a predefined set of valid arguments. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-09 09:45:59 -05:00
Tomasz Grabiec	aa173898d6	Merge "Named semaphores in concurrency reader, segment_manager and region_group" from Juliusz Selected semaphores' names are now included in exception messages in case of timeout or when admission queue overflows. Resolves #5281	2019-12-05 14:19:56 +01:00
Juliusz Stasiewicz	430b2ad19d	commitlog+region_group: timeout exceptions with names `segment_manager' now uses a decorated version of `timed_out_error' with hardcoded name. On the other hand `region_group' uses named `on_request_expiry' within its `expiring_fifo'.	2019-12-03 19:07:19 +01:00
Botond Dénes	690e9d2b44	utils: introduce linearizing_input_stream `linearizing_input_stream` allows transparently reading linearized values from a fragmented buffer. This is done by linearizing on-the-fly only those read values that happen to be split across multiple fragments. This reduces the size of the largest allocation from the size of the entire buffer (when the entire buffer is linearized) to the size of the largest read value. This is a huge gain when the buffer contains loads of small objects, and modest gains when the buffer contains few large objects. But the even in the worst case the size of the largest allocation will be less or equal compared to the case where the entire buffer is linearized. This stream is planned to be used as glue code between the fragmented cell value and the collection deserialization code which expects to be reading linearized values.	2019-12-02 10:10:31 +02:00
Botond Dénes	4054ba0c45	serialization: accept any CharOutputIterator Not just bytes::output_iterator. Allow writing into streams other than just `bytes`. In fact we should be very careful with writing into `bytes` as they require potentially large contiguous allocations. The `write()` method is now templatized also on the type of its first argument, which now accepts any CharOutputIterator. Due to our poor usage of namespace this now collides with `write` defined inside `db/commitlog/commitlog.cc`. Luckily, the latter doesn't really have to be templatized on the data type it reads from, and de-templatizing it resolves the clash.	2019-12-02 10:10:31 +02:00
Dejan Mircevski	c43b286f35	utils: Add operator<< for big_decimal ... and remove an existing duplicate from lua.cc. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 15:32:09 -05:00
Pavel Solodovnikov	2f442f28af	treewide: add const qualifiers throughout the code base	2019-11-26 02:24:49 +03:00
Tomasz Grabiec	fb28543116	lsa: Introduce operator bool() to occupancy_stats	2019-11-22 12:08:28 +01:00
Tomasz Grabiec	a69fda819c	lsa: Expose region_impl::evictable_occupancy in the region class	2019-11-22 12:08:10 +01:00
Tomasz Grabiec	5e4abd75cc	main: Abort on EBADF and ENOTSOCK by default Those are typically symptoms of use-after-free or memory corruption in the program. It's better to catch such error sooner than later. That situation is also dangerous since if a valid descriptor would land under the invalid access, not the one which was intended for the operation, then the operation may be performed on the wrong file and result in corruption. Message-Id: <1565206788-31254-1-git-send-email-tgrabiec@scylladb.com>	2019-11-19 13:07:33 +02:00
Gleb Natapov	b3e01a45d7	lwt: storage_proxy: implement paxos protocol This patch adds all functionality needed for Paxos protocol. The implementation does not strictly adhere to Paxos paper since the original paper allows setting a value only once, while for LWT we need to be able to make another Paxos round after "learn" phase completes, which requires things like repair to be introduced.	2019-10-27 23:21:51 +03:00
Nadav Har'El	51fc6c7a8e	make static_row optional to reduce memory footprint Merged patch series from Avi Kivity: The static row can be rare: many tables don't have them, and tables that do will often have mutations without them (if the static row is rarely updated, it may be present in the cache and in readers, but absent in memtable mutations). However, it always consumes ~100 bytes of memory, even if it not present, due to row's overhead. Change it to be optional by allocating it as an external object rather than inlined into mutation_partition. This adds overhead when the static row is present (17 bytes for the reference, back reference, and lsa allocator overhead). perf_simple_query appears to marginally (2%) faster. Footprint is reduced by ~9% for a cache entry, 12% in memtables. More details are provided in the patch commitlog. Tests: unit (debug) Avi Kivity (4): managed_ref: add get() accessor managed_ref: add external_memory_usage() mutation_partition: introduce lazy_row mutation_partition: make static_row optional to reduce memory footprint cell_locking.hh \| 2 +- converting_mutation_partition_applier.hh \| 4 +- mutation_partition.hh \| 284 ++++++++++++++++++++++- partition_builder.hh \| 4 +- utils/managed_ref.hh \| 12 + flat_mutation_reader.cc \| 2 +- memtable.cc \| 2 +- mutation_partition.cc \| 45 +++- mutation_partition_serializer.cc \| 2 +- partition_version.cc \| 4 +- tests/multishard_mutation_query_test.cc \| 2 +- tests/mutation_source_test.cc \| 2 +- tests/mutation_test.cc \| 12 +- tests/sstable_mutation_test.cc \| 10 +- 14 files changed, 355 insertions(+), 32 deletions(-)	2019-10-22 12:25:15 +03:00
Avi Kivity	efe8fa6105	managed_ref: add external_memory_usage() Like other managed containers, add external_memory_usage() so we can account for a partition's memory footprint in memtable/cache.	2019-10-15 15:41:42 +03:00
Avi Kivity	90096da9f3	managed_ref: add get() accessor While a managed_ref emulates a reference more closely than it does a pointer, it is still nullable, so add a get() (similar to unique_ptr::get()) that can be nullptr if the reference is null. The immediate use will be mutation_partition::_static_row, which is often empty and takes up about 10% of a cache entry.	2019-09-30 20:55:36 +03:00
Gleb Natapov	f9209e27d4	lwt: Add missing functions to utils/UUID_gen.hh Some lwt related code is missing in our UUID implementation. Add it.	2019-09-26 11:44:00 +03:00
Tomasz Grabiec	eb08ab7ed9	lsa: Assert no cross-shard region locking We observed an abort on bad_alloc which was not caused by real OOM, but could be explained by cache region being locked from a different shard, which is not allowed, concurrently with memory reclamation. It's impossible now to prove this, or, if that was indeed the case, to determine which code path was attempting such lock. This patch adds an assert which would catch such incorrect locking at the attempt. Refs #4978	2019-09-23 12:51:29 +02:00
Botond Dénes	fddd9a88dd	treewide: silence discarded future warnings for legit discards This patch silences those future discard warnings where it is clear that discarding the future was actually the intent of the original author, and they did the necessary precautions (handling errors). The patch also adds some trivial error handling (logging the error) in some places, which were lacking this, but otherwise look ok. No functional changes.	2019-08-26 18:54:44 +03:00
Dejan Mircevski	8be147d069	cql3: Handle empty LIKE pattern Match SQL's LIKE in allowing an empty pattern, which matches only an empty text field. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-08-12 19:48:31 +03:00
Rafael Ávila de Espíndola	99c7f8457d	logalloc: Add a migrators_base that is common to debug and release This simplifies the debug implementation and it now should work with scylla-gdb.py. It is not clear what, if anything, is lost by not using random ids. They were never being reused in the debug implementation anyway. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190618144755.31212-1-espindola@scylladb.com>	2019-08-12 19:44:55 +03:00
Nadav Har'El	f9d6eaf5ff	reconcilable_result: switch to chunked_vector Merged patch series from Avi Kivity: In rare but valid cases (reconciling many tombstones, paging disabled), a reconciled_result can grow large. This triggers large allocation warnings. Switch to chunked_vector to avoid the large allocation. In passing, fix chunked_vector's begin()/end() const correctness, and add the reverse iterator function family which is needed by the conversion. Fixes #4780. Tests: unit (dev) Commit Summary utils: chunked_vector: make begin()/end() const correct utils::chunked_vector: add rbegin() and related iterators reconcilable_result: use chunked_vector to hold partitions	2019-08-11 16:03:13 +03:00
Pekka Enberg	73113c0ea4	utils/fb_utilities.hh: Kill obsolete FIXME and commented out Java code The FIXME was added in the very first commit ("utils: Convert utils/FBUtilities.java") that introduced the fb_utilities class as a stub. However, we have long implemented the parts that we actually use, so drop the FIXME as obsolete. In addition, drop the remaining uncommented Java code as unused and also obsolete. Message-Id: <20190808182758.1155-1-penberg@scylladb.com>	2019-08-11 10:26:36 +03:00
Tomasz Grabiec	bf70ee3986	config, exceptions: Add helper for handling internal errors The handler is intended to be called when internal invariants are violated and the operation cannot safely continue. The handler either throws (default) or aborts, depending on configuration option. Passing --abort-on-internal-error on the command line will switch to aborting. The reason we don't abort by default is that it may bring the whole cluster down and cause unavailability, while it may not be necessary to do so. It's safer to fail just the affected operation, e.g. repair. However, failing the operation with an exception leaves little information for debugging the root cause. So the idea is that the user would enable aborts on only one of the nodes in the cluster to get a core dump and not bring the whole cluster down.	2019-08-02 11:13:54 +02:00
Tomasz Grabiec	61a9cfbfa9	utils: config_file: Introduce named_value::observe()	2019-08-02 11:13:53 +02:00
Avi Kivity	eaa9a5b0d7	utils::chunked_vector: add rbegin() and related iterators Needed as an std::vector replacement.	2019-08-01 18:39:47 +03:00
Avi Kivity	df6faae980	utils: chunked_vector: make begin()/end() const correct begin() of a const vector should return a const_iterator, to avoid giving the caller the ability to mutate it. This slipped through since iterator's constructor does a const_cast. Noticed by code inspection.	2019-08-01 18:38:53 +03:00
Calle Wilund	1ed9a44396	utils::config_file: Propagare broadcast_to_all_shards to dependent files Fixes #4713 Modifying config files to use sharded storage misses the fact that extensions are allowed to add non-member config fields to the main configuration, typically from "extra" config_file objects. Unless those "extra" files are broadcast when main file broadcast, the values will not be readable from other shards. This patch propagates the broadcast to all other config files whose entries are in the top level object. This ensures we always keep data up to date on config reload. Message-Id: <20190715135851.19948-1-calle@scylladb.com>	2019-07-15 17:02:09 +03:00
Paweł Dziepak	eb7d17e5c5	lsa: make sure align_up_for_asan() doesn't cause reads past end of segment In debug mode the LSA needs objects to be 8-byte aligned in order to maximise coverage from the AddressSanitizer. Usually `close_active()` creates a dummy objects that covers the end of the segment being closed. However, it the last real objects ends in the last eight bytes of the segment then that dummy won't be created because of the alignment requirements. This broke exit conditions on loops trying to read all objects in the segment and caused them to attempt to dereference address at the end of the segment. This patch fixes that. Fixes #4653.	2019-07-10 19:19:24 +02:00
Amnon Heiman	2fbc5ea852	config_file.hh: get_value return a pointer to the value The get_value method returns a pointer to the value that is used by the value_to_json method. The assumption is that the void pointer points to the actual value. Fixes #4678 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-07-10 10:40:35 +03:00
Piotr Sarna	eed2543bcc	utils: make string-based big decimal constructor explicit As a rule of thumb, single-parameter constructors should be explicit in order to avoid unexpected implicit conversions.	2019-07-04 11:33:00 +02:00
Piotr Sarna	a5e41408ec	utils: add operators to big_decimal For convenience, operators -=, + and - are implemented on top of +=.	2019-07-04 11:32:53 +02:00
Tomasz Grabiec	eb496b5eae	Merge "Allow changing configuration at runtime" from Avi This patchset allows changing the configuration at runtime, The user triggers this by editing the configuration file normally, then signalling the database with SIGHUP (as is traditional). The implementation is somewhat complicated due the need to store non-atomic mutable state per-shard and to synchronize the values in all shards. This is somewhat similar to Seastar's sharded<>, but that cannot be used since the configuration is read before Seastar is initialized (due to the need to read command-line options). Tests: unit (dev, debug), manual test with extra prints (dev) Ref #2689 Fixes #2517.	2019-07-01 15:04:59 +02:00
Avi Kivity	6061a833a3	config: make values updateable Replace the per-shard value we store with an updateable_value_source, which allows updating it dynamically and allows users to track changes. The broadcast_to_all_shards() function is augmented to apply modifications when called on a live system.	2019-06-28 16:43:25 +03:00
Avi Kivity	f7de01d082	config: store copies of config items per shard Since some of our values are not atomic (strings) and the administrative information needed to track references to values is also not atomic, we will need to store them per-shard. To do that we add a vector of per-shard data to config_file, where each element is itself a vector of configuration items. Since we need to operate generically on items (copying them from shard to shard) we store them in a type-erased form. Only mutable state is stored per-shard.	2019-06-28 16:43:25 +03:00
Avi Kivity	fb23cd1ff6	Introduce updatable_value The updateable_value and updateable_value_source classes allow broadcasting configuration changes across the application. The updateable_value_source class represents a value that can be updated, and updateable_value tracks its source and reflects changes. A typical use replaces "uint64_t config_item" with "updateable_value<uint64_t> config_item", and from now on changes to the source will be reflected in config_item. For more complicated uses, which must run some callback when configuration changes, you can also call config_item.observe(callback) to be actively notified of changes.	2019-06-28 16:43:25 +03:00
Avi Kivity	da2a98cde6	config: don't allow assignment to config values Currently, we allow adjusting configuration via cfg.whatever() = 5; by returning a mutable reference from cfg.whatever(). Soon, however, this operation will have side effects (updating all references to the config item, and triggering notifiers). While this can be done with a proxy, it is too tricky. Switch to an ordinary setter interface: cfg.whatever.set(5); Because boost::program_options no longer gets a reference to the value to be written to, we have to move the update to a notifier, and the value_ex() function has to be adjusted to infer whether it was called with a vector type after it is called, not before.	2019-06-28 16:43:25 +03:00
Avi Kivity	b146fd1356	config: make noncopyable config_file and db::config are soon not going to be copyable. The reason is that in order to support live updating, we'll need per-shard copies of each value, and per-shard tracking of references to values. While these can be copied, it will be an asycnronous operation and thus cannot be done from a copy constructor. So to prepare for these changes, replace all copies of db::config by references and delete config_file's copy constructor. Some existing references had to be made const in order to adapt the const-ness of db::config now being propagated (rather than being terminated by a non-const copy).	2019-06-28 16:43:25 +03:00

1 2 3 4 5 ...

679 Commits