scylladb

Author	SHA1	Message	Date
Avi Kivity	48b694df55	cql3: like_matcher: pimplify to reduce inclusions of boost/regex boost/regex has huge header dependencies amounting to tens of thousands of lines. This are now replicated in 167 translation units. This patch converts like_matcher to use the pointer-to-implementation idiom, which reduces the number of translations including boost/regex to 28. Since regular expressions are relatively expensive, and like_matcher is relatively rare, the extra memory usage and run time will be negligible. Message-Id: <20200211170152.809554-1-avi@scylladb.com>	2020-02-12 17:04:12 +02:00
Pavel Emelyanov	d1775dd701	utils: Move disk-error-handler into it The disk-error-handler is purely auxiliary thing that helps propagating IO errors to the rest of the code. It well deserves not sitting in the root namespace. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200207112443.18475-1-xemul@scylladb.com>	2020-02-09 17:26:52 +02:00
Avi Kivity	b01f0cab60	utils: add missing include for ssize_t gcc 10 tightened its C++ includes to no longer provide ssize_t, so we must get it from a C header instead. Message-Id: <20200129205912.21139-1-avi@scylladb.com>	2020-01-30 14:10:18 +02:00
Rafael Ávila de Espíndola	090164791c	logalloc: Store unused ids in a std::vector There doesn't seem to be any requirement for how unused ids are reused, so we may as well use the simpler type. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200129211154.47907-1-espindola@scylladb.com>	2020-01-30 10:31:16 +02:00
Avi Kivity	3343baf159	Merge "cql3: time_uuid_fcts: validate time UUID" from Benny " Throw an error in case we hit an invalid time UUID rather than hitting an assert. Fixes #5552 (Ref #5588 that was dequeued and fixed here) Test: UUID_test, cql_query_test(debug) " * 'validate-time-uuid' of https://github.com/bhalevy/scylla: cql3: abstract_function_selector: provide assignment_testable_source_context test: cql_query_test: add time uuid validation tests cql3: time_uuid_fcts: validate timestamp arg cql3: make_max_timeuuid_fct: delete outdated FIXME comment cql3: time_uuid_fcts: validate time UUID test: UUID_test: add tests for time uuid utils: UUID: create_time assert nanos_since validity utils/UUID_gen: make_nanos_since utils: UUID: assert UUID.is_timestamp	2020-01-29 00:11:17 +02:00
Avi Kivity	ec1687e4fe	Merge "Remove deprecated partitioners #5636 " from Piotr " This PR makes named_value respect allowed_values and then use it to transition away from old deprecated RandomPartitioner and ByteOrderedPartitioner. Then it removes the code that's no longer used. We want to remove deprecated partitioners because, on one hand, they lead to performance problems and hot nodes. Moreover, we're planning to unify the token representation which would allow per table partitioner support. That, in turn, is a feature helpful in multiple efforts like CDC, materialized views, secondary indexes and multi-tenancy. tests: unit(dev) " * 'remove_deprecated_partitioners' of https://github.com/haaawk/scylla: partitioners: remove random_partitioner partitioners: Make it impossible to use RandomPartitioner partitioners: remove byte_ordered_partitioner partitioners: Make it impossible to use ByteOrderedPartitioner partitioners: Remove leftovers of OrderPreservingPartitioner i_partitioner.cc: stop including byte_ordered_partitioner.hh i_partitioner.cc: stop including random_partitioner.hh config: use allowed_values to verify named_value input config: add operator<< for seed_provider_type	2020-01-29 00:11:17 +02:00
Benny Halevy	72e2ea47c1	cql3: time_uuid_fcts: validate time UUID Throw an error in case we hit an invalid time UUID rather than hitting an assert. Ref #5552 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:09:01 +02:00
Benny Halevy	f8b079b599	utils: UUID: create_time assert nanos_since validity Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:09:01 +02:00
Benny Halevy	cd3460cc88	utils/UUID_gen: make_nanos_since Safely convert millis to "nanos_since" (number of 100 nanseconds since START_EPOCH) while type casting to uint64_t to avoid possible int overflow. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-27 11:08:16 +02:00
Benny Halevy	22bac26023	utils: UUID: assert UUID.is_timestamp Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-01-26 18:54:36 +02:00
Piotr Jastrzebski	6a2cd64b5c	config: use allowed_values to verify named_value input Even though we configure the set of accepted values for some config flags, named_value ignore them. This patch implements the checks that verify flag is not set to the value that's not on the list. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-01-24 09:08:59 +01:00
Rafael Ávila de Espíndola	d9a71a7cff	service: Refactor code into a atomic_vector class This templates the code for listener_vector, renames it to atomic_vector and moves it to the utils directory. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-22 08:16:03 -08:00
Avi Kivity	1f46133273	Merge "data: make cell::make_collection() exception safe" from Botond " Most of the code in `cell` and the `imr` infrastructure it is built on is `noexcept`. This means that extra care must be taken to avoid rouge exceptions as they will bring down the node. The changes introduced by 0a453e5d3a did just that - introduced rouge `std::bad_alloc` into this code path by violating an undocumented and unvalidated assumption -- that fragment ranges passed to `cell::make_collection()` are nothrow copyable and movable. This series refactors `cell::make_collection()` such that it does not have this assumption anymore and is safe to use with any range. Note that the unit test included in this series, that was used to find all the possible exception sources will not be currently run in any of our build modes, due to `SEASTAR_ENABLE_ALLOC_FAILURE_INJECTION` not being set. I plan to address this in a followup because setting this flags fails other tests using the failure injection mechanism. This is because these tests are normally run with the failure injection disabled so failures managed to lurk in without anyone noticing. Fixes: #5575 Refs: #5341 Tests: unit(dev, debug) " * 'data-cell-make-collection-exception-safety/v2' of https://github.com/denesb/scylla: test: mutation_test: add exception safety test for large collection serialization data/cell.hh: avoid accidental copies of non-nothrow copiable ranges utils/fragment_range.hh: introduce fragment_range_view	2020-01-14 10:01:06 +02:00
Botond Dénes	b52b4d36a2	utils/fragment_range.hh: introduce fragment_range_view A lightweight, trivially copyable and movable view for fragment ranges. Allows for uniform treatment of all kinds of ranges, i.e. treating all of them as a view. Currently `fragment_range.hh` provides lightweight, view-like adaptors for empty and single-fragment ranges (`bytes_view`). To allow code to treat owning multi-fragment ranges the shame way as the former two, we need a view for the latter as well -- this is `fragment_range_view`.	2020-01-13 16:52:59 +02:00
Avi Kivity	454074f284	Merge "database: Avoid OOMing with flush continuations after failed memtable flush" from Tomasz " The original fix (`10f6b125c8`) didn't take into account that if there was a failed memtable flush (Refs flush) but is not a flushable memtable because it's not the latest in the memtable list. If that happens, it means no other memtable is flushable as well, cause otherwise it would be picked due to evictable_occupancy(). Therefore the right action is to not flush anything in this case. Suspected to be observed in #4982. I didn't manage to reproduce after triggering a failed memtable flush. Fixes #3717 " * tag 'avoid-ooming-with-flush-continuations-v2' of github.com:tgrabiec/scylla: database: Avoid OOMing with flush continuations after failed memtable flush lsa: Introduce operator bool() to occupancy_stats lsa: Expose region_impl::evictable_occupancy in the region class	2020-01-08 16:58:54 +02:00
Rafael Ávila de Espíndola	3d641d4062	lua: Use existing cpp_int cast logic Different versions of boost have different rules for what conversions from cpp_int to smaller intergers are allowed. We already had a function that worked with all supported versions, but it was not being use by lua. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200104041028.215153-1-espindola@scylladb.com>	2020-01-05 12:10:54 +02:00
Benny Halevy	4c884908bb	directories: Keep a unique set of directories to initialize If any two directories of data/commitlog/hints/view_hints are the same we still end up running verify_owner_and_mode and disk_sanity(check_direct_io_support) in parallel on the same directoriea and hit #5510. This change uses std::set rather than std::vector to collect a unique set of directories that need initialization. Fixes #5510 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191225160645.2051184-1-bhalevy@scylladb.com>	2019-12-29 16:26:26 +02:00
Pavel Emelyanov	a5cdfea799	directories: Do not mess with per-shard base dir The hints and view_hints directory has per-shard sub-dirs, and the directories code tries to create, check and lock all of them, including the base one. The manipulations in question are excessive -- it's enough to check and lock either the base dir, or all the per-shard ones, but not everything. Let's take the latter approach for its simplicity. Fixes #5510 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Looks-good-to: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191223142429.28448-1-xemul@scylladb.com>	2019-12-24 14:49:28 +02:00
Pavel Emelyanov	23a8d32920	directories: Make internals work on fs::path Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	373fcfdb3e	directories: Cleanup adding dirs to the vector to work on The unordered_set is turned into vector since for fs::path there's no hash() method that's needed for set. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	14437da769	directories: Drop seastar::async usage Now the only future-able operation remained is the call to parallel_for_each(), all the rest is non-blocking preparation, so we can drop the seastar::async and just return the future from parallel_for_each. The indendation is now good, as in previous patch is was prepared just for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	06f4f3e6d8	directories: Do touch_and_lock and verify sequentially The goal is to drop the seastar::async() usage. Currently we have two places that return futures -- calls to parallel_for_each-s. We can either chain them together or, since both are working on the same set of directories, chain actions inside them. For code simplicity I propose to chain actions. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	8d0c820aa1	directories: Do touch_and_lock in parallel The list of paths that should be touch-and-locked is already at hands, this shortens the code and makes it slightly faster (in theory). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	71a528d404	directories: Move the whole stuff into own .cc file In order not to pollute the root dir place the code in utils/ directory, "utils" namespace. While doing this -- move the touch_and_lock from the class declaration. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 19:52:01 +03:00
Pavel Emelyanov	82ef2a7730	file_lock: Work with fs::path, not sstring The main.cc code that converts sstring to fs::path will be patched soon, the file_desc::open belongs to seastar and works on sstrings. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-12 17:32:10 +03:00
Dejan Mircevski	a26bd9b847	utils: Add enum_option This allows us to accept command-line options with a predefined set of valid arguments. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-12-09 09:45:59 -05:00
Tomasz Grabiec	aa173898d6	Merge "Named semaphores in concurrency reader, segment_manager and region_group" from Juliusz Selected semaphores' names are now included in exception messages in case of timeout or when admission queue overflows. Resolves #5281	2019-12-05 14:19:56 +01:00
Juliusz Stasiewicz	430b2ad19d	commitlog+region_group: timeout exceptions with names `segment_manager' now uses a decorated version of `timed_out_error' with hardcoded name. On the other hand `region_group' uses named `on_request_expiry' within its `expiring_fifo'.	2019-12-03 19:07:19 +01:00
Botond Dénes	690e9d2b44	utils: introduce linearizing_input_stream `linearizing_input_stream` allows transparently reading linearized values from a fragmented buffer. This is done by linearizing on-the-fly only those read values that happen to be split across multiple fragments. This reduces the size of the largest allocation from the size of the entire buffer (when the entire buffer is linearized) to the size of the largest read value. This is a huge gain when the buffer contains loads of small objects, and modest gains when the buffer contains few large objects. But the even in the worst case the size of the largest allocation will be less or equal compared to the case where the entire buffer is linearized. This stream is planned to be used as glue code between the fragmented cell value and the collection deserialization code which expects to be reading linearized values.	2019-12-02 10:10:31 +02:00
Botond Dénes	4054ba0c45	serialization: accept any CharOutputIterator Not just bytes::output_iterator. Allow writing into streams other than just `bytes`. In fact we should be very careful with writing into `bytes` as they require potentially large contiguous allocations. The `write()` method is now templatized also on the type of its first argument, which now accepts any CharOutputIterator. Due to our poor usage of namespace this now collides with `write` defined inside `db/commitlog/commitlog.cc`. Luckily, the latter doesn't really have to be templatized on the data type it reads from, and de-templatizing it resolves the clash.	2019-12-02 10:10:31 +02:00
Dejan Mircevski	c43b286f35	utils: Add operator<< for big_decimal ... and remove an existing duplicate from lua.cc. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-11-29 15:32:09 -05:00
Pavel Solodovnikov	2f442f28af	treewide: add const qualifiers throughout the code base	2019-11-26 02:24:49 +03:00
Tomasz Grabiec	fb28543116	lsa: Introduce operator bool() to occupancy_stats	2019-11-22 12:08:28 +01:00
Tomasz Grabiec	a69fda819c	lsa: Expose region_impl::evictable_occupancy in the region class	2019-11-22 12:08:10 +01:00
Tomasz Grabiec	5e4abd75cc	main: Abort on EBADF and ENOTSOCK by default Those are typically symptoms of use-after-free or memory corruption in the program. It's better to catch such error sooner than later. That situation is also dangerous since if a valid descriptor would land under the invalid access, not the one which was intended for the operation, then the operation may be performed on the wrong file and result in corruption. Message-Id: <1565206788-31254-1-git-send-email-tgrabiec@scylladb.com>	2019-11-19 13:07:33 +02:00
Gleb Natapov	b3e01a45d7	lwt: storage_proxy: implement paxos protocol This patch adds all functionality needed for Paxos protocol. The implementation does not strictly adhere to Paxos paper since the original paper allows setting a value only once, while for LWT we need to be able to make another Paxos round after "learn" phase completes, which requires things like repair to be introduced.	2019-10-27 23:21:51 +03:00
Nadav Har'El	51fc6c7a8e	make static_row optional to reduce memory footprint Merged patch series from Avi Kivity: The static row can be rare: many tables don't have them, and tables that do will often have mutations without them (if the static row is rarely updated, it may be present in the cache and in readers, but absent in memtable mutations). However, it always consumes ~100 bytes of memory, even if it not present, due to row's overhead. Change it to be optional by allocating it as an external object rather than inlined into mutation_partition. This adds overhead when the static row is present (17 bytes for the reference, back reference, and lsa allocator overhead). perf_simple_query appears to marginally (2%) faster. Footprint is reduced by ~9% for a cache entry, 12% in memtables. More details are provided in the patch commitlog. Tests: unit (debug) Avi Kivity (4): managed_ref: add get() accessor managed_ref: add external_memory_usage() mutation_partition: introduce lazy_row mutation_partition: make static_row optional to reduce memory footprint cell_locking.hh \| 2 +- converting_mutation_partition_applier.hh \| 4 +- mutation_partition.hh \| 284 ++++++++++++++++++++++- partition_builder.hh \| 4 +- utils/managed_ref.hh \| 12 + flat_mutation_reader.cc \| 2 +- memtable.cc \| 2 +- mutation_partition.cc \| 45 +++- mutation_partition_serializer.cc \| 2 +- partition_version.cc \| 4 +- tests/multishard_mutation_query_test.cc \| 2 +- tests/mutation_source_test.cc \| 2 +- tests/mutation_test.cc \| 12 +- tests/sstable_mutation_test.cc \| 10 +- 14 files changed, 355 insertions(+), 32 deletions(-)	2019-10-22 12:25:15 +03:00
Avi Kivity	efe8fa6105	managed_ref: add external_memory_usage() Like other managed containers, add external_memory_usage() so we can account for a partition's memory footprint in memtable/cache.	2019-10-15 15:41:42 +03:00
Avi Kivity	90096da9f3	managed_ref: add get() accessor While a managed_ref emulates a reference more closely than it does a pointer, it is still nullable, so add a get() (similar to unique_ptr::get()) that can be nullptr if the reference is null. The immediate use will be mutation_partition::_static_row, which is often empty and takes up about 10% of a cache entry.	2019-09-30 20:55:36 +03:00
Gleb Natapov	f9209e27d4	lwt: Add missing functions to utils/UUID_gen.hh Some lwt related code is missing in our UUID implementation. Add it.	2019-09-26 11:44:00 +03:00
Tomasz Grabiec	eb08ab7ed9	lsa: Assert no cross-shard region locking We observed an abort on bad_alloc which was not caused by real OOM, but could be explained by cache region being locked from a different shard, which is not allowed, concurrently with memory reclamation. It's impossible now to prove this, or, if that was indeed the case, to determine which code path was attempting such lock. This patch adds an assert which would catch such incorrect locking at the attempt. Refs #4978	2019-09-23 12:51:29 +02:00
Botond Dénes	fddd9a88dd	treewide: silence discarded future warnings for legit discards This patch silences those future discard warnings where it is clear that discarding the future was actually the intent of the original author, and they did the necessary precautions (handling errors). The patch also adds some trivial error handling (logging the error) in some places, which were lacking this, but otherwise look ok. No functional changes.	2019-08-26 18:54:44 +03:00
Dejan Mircevski	8be147d069	cql3: Handle empty LIKE pattern Match SQL's LIKE in allowing an empty pattern, which matches only an empty text field. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-08-12 19:48:31 +03:00
Rafael Ávila de Espíndola	99c7f8457d	logalloc: Add a migrators_base that is common to debug and release This simplifies the debug implementation and it now should work with scylla-gdb.py. It is not clear what, if anything, is lost by not using random ids. They were never being reused in the debug implementation anyway. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190618144755.31212-1-espindola@scylladb.com>	2019-08-12 19:44:55 +03:00
Nadav Har'El	f9d6eaf5ff	reconcilable_result: switch to chunked_vector Merged patch series from Avi Kivity: In rare but valid cases (reconciling many tombstones, paging disabled), a reconciled_result can grow large. This triggers large allocation warnings. Switch to chunked_vector to avoid the large allocation. In passing, fix chunked_vector's begin()/end() const correctness, and add the reverse iterator function family which is needed by the conversion. Fixes #4780. Tests: unit (dev) Commit Summary utils: chunked_vector: make begin()/end() const correct utils::chunked_vector: add rbegin() and related iterators reconcilable_result: use chunked_vector to hold partitions	2019-08-11 16:03:13 +03:00
Pekka Enberg	73113c0ea4	utils/fb_utilities.hh: Kill obsolete FIXME and commented out Java code The FIXME was added in the very first commit ("utils: Convert utils/FBUtilities.java") that introduced the fb_utilities class as a stub. However, we have long implemented the parts that we actually use, so drop the FIXME as obsolete. In addition, drop the remaining uncommented Java code as unused and also obsolete. Message-Id: <20190808182758.1155-1-penberg@scylladb.com>	2019-08-11 10:26:36 +03:00
Tomasz Grabiec	bf70ee3986	config, exceptions: Add helper for handling internal errors The handler is intended to be called when internal invariants are violated and the operation cannot safely continue. The handler either throws (default) or aborts, depending on configuration option. Passing --abort-on-internal-error on the command line will switch to aborting. The reason we don't abort by default is that it may bring the whole cluster down and cause unavailability, while it may not be necessary to do so. It's safer to fail just the affected operation, e.g. repair. However, failing the operation with an exception leaves little information for debugging the root cause. So the idea is that the user would enable aborts on only one of the nodes in the cluster to get a core dump and not bring the whole cluster down.	2019-08-02 11:13:54 +02:00
Tomasz Grabiec	61a9cfbfa9	utils: config_file: Introduce named_value::observe()	2019-08-02 11:13:53 +02:00
Avi Kivity	eaa9a5b0d7	utils::chunked_vector: add rbegin() and related iterators Needed as an std::vector replacement.	2019-08-01 18:39:47 +03:00
Avi Kivity	df6faae980	utils: chunked_vector: make begin()/end() const correct begin() of a const vector should return a const_iterator, to avoid giving the caller the ability to mutate it. This slipped through since iterator's constructor does a const_cast. Noticed by code inspection.	2019-08-01 18:38:53 +03:00

1 2 3 4 5 ...

690 Commits