scylladb

Author	SHA1	Message	Date
Piotr Jastrzebski	924ed7bb1c	make_multishard_combining_reader: stop taking partitioner The function already takes schema so there's no need for it to take partitioner. It can be obtained using schema::get_partitioner Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Avi Kivity	6b747f4673	database: avoid creating thread in make_directory_for_column_family() make_directory_for_column_family() is used in a parallel_for_each() in parse_system_tables(). Because parallel_for_each does not preempt in the initial execution of its input function, and because each thread allocates 128k for the stack, we end up allocating many hundreds of megabytes if there are many tables. This happens early during boot and will only cause problems if there are 5,000 tables per gigabyte of shard memory, and unlikely combination that will probably fail later, but still it is better to avoid unnecessary large allocations. This was developed in order to fix #6003, until it was discovered that `c020b4e5e2` ("logalloc: increase capacity of _regions vector outside reclaim lock") is the real fix. Message-Id: <20200313093603.1366502-1-avi@scylladb.com>	2020-03-13 13:46:45 +02:00
Piotr Jastrzebski	54d24553bb	schema: get_partitioner return const& Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-06 13:33:53 +01:00
Avi Kivity	906784639d	Merge "Clean sstables from using global objects" from Pavel E " This set cleans sstable_writer_config and surrounding sstables code from using global storage_ and feature_ service-s and database by moving the configuration logic onto sstables_manager (that was supposed to do it since `eebc3701a5`). Most of the complexity is hidden around sstable_writer_config creation, this set makes the sstables_manager create this object with an explicit call. All the rest are consequences of this change. Tests: unit(debug), manual start-stop " * 'br-clean-sstables-manager-2' of https://github.com/xemul/scylla: sstables: Move get_highest_supported_format sstables: Remove global get_config() helper sstables: Use manager's config() in .new_sstable_component_file() sstable_writer_config: Extend with more db::config stuff sstables_manager: Don't use global helper to generate writer config sstable_writer_config: Sanitize out some features fields initialization sstable_writer_config: Factor out some field initialization sstables: Generate writer config via manager only sstables: Keep reference on manager test: Re-use existing global sstables_manager table: Pass sstable_writer_config into write_memtable_to_sstable	2020-03-03 18:33:01 +02:00
Avi Kivity	157fe4bd19	Merge "Remove default timeouts" from Botond " Timeouts defaulted to `db::no_timeout` are dangerous. They allow any modifications to the code to drop timeouts and introduce a source of unbounded request queue to the system. This series removes the last such default timeouts from the code. No problems were found, only test code had to be updated. tests: unit(dev) " * 'no-default-timeouts/v1' of https://github.com/denesb/scylla: database: database::query(), database::apply(): remove default timeouts database: table::query(): remove default timeout mutation_query: data_query(): remove default timeout mutation_query: mutation_query(): remove default timeout multishard_mutation_query: query_mutations_on_all_shards(): remove default timeout reader_concurrency_semaphore: wait_admission(): remove default timeout utils/logallog: run_when_memory_available(): remove default timeout	2020-03-01 17:29:17 +02:00
Rafael Ávila de Espíndola	ba453d832b	Pass string_view to keyspace_metadata::new_keyspace This avoids a few sstring copies. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 17:04:12 -08:00
Rafael Ávila de Espíndola	94d07fba07	Pass string_view to the keyspace_metadata constructor This avoids a few sstring copies when constructing keyspace_metadata. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 17:04:12 -08:00
Rafael Ávila de Espíndola	2b96abcece	Pass string_view to no_such_column_family's constructor With this we don't have to construct a sstring to construct a no_such_column_family. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-28 08:36:27 -08:00
Botond Dénes	fdb45d16de	mutation_query: mutation_query(): remove default timeout	2020-02-27 18:56:30 +02:00
Botond Dénes	93039a085d	utils/logallog: run_when_memory_available(): remove default timeout	2020-02-27 18:36:32 +02:00
Botond Dénes	7bdeec4b00	flat_mutation_reader: make_reversing_reader(): add memory limit If the reversing requires more memory than the limit, the read is aborted. All users are updated to get a meaningful limit, from the respective table object, with the exception of tests of course.	2020-02-27 18:11:54 +02:00
Botond Dénes	75efa707ce	db/config: add config memory limit of otherwise unlimited queries We have a few kind of queries whose memory consumption is not limited at all. One of these is reverse queries, which reads entire partitions into memory, before reversing them. These partitions can be larger than memory and thus such a query can single-handedly cause OOM. This patch introduces a configuration for a memory limit for such queries. This will serve as a hard limit and queries which attempt to use more memory than this, will be aborted. The limit is propagated to table objects, with the intention of keeping system tables unlimited. These tables are usually small and initiators of system queries are not prepared for failures.	2020-02-27 18:11:54 +02:00
Pavel Emelyanov	7363d56946	sstables: Move get_highest_supported_format The global get_highest_supported_format helper and its declaration are scattered all over the code, so clean this up and prepare the ground for moving _sstables_format from the storage_service onto the sstables_manager (not this set). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:45 +03:00
Pavel Emelyanov	85d9326d70	sstables_manager: Don't use global helper to generate writer config The main goal of this patch is to stop using get_config() glbal when creating the sstable_writer_config instance. Other than being global the existing get_config() is also confusing as it effectively generates 3 (three) sorts of configs -- one for scylla, when db config and features are ready, the other one for tests, when no storage service is at hands, and the third one for tests as well, when the storage service is created by test env (likely intentionally, but maybe by coincidence the resulting config is the same as for no-storage-service case). With this patch it's now 100% clear which one is used when. Also this makes half the work of removing get_config() helper. The db::config and feature_service used to initialize the managers are referenced by database that creates and keeps managers on, so the references are safe. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-25 14:31:04 +03:00
Piotr Sarna	5e07c00eeb	Merge 'Delete table snapshot' from Amnon This series adds an option to the API that supports deleting a specific table from a snapshot. The implementation works in a similar way to the option to specify specific keyspaces when deleting a snapshot. The motivation is to allow reducing disk-space when using the snapshot for backup. A dtest PR is sent to the dtest repository. Fixes #5658 Original PR #5805 Tests: (database_test) (dtest snapshot_test.py:TestSnapshot.test_cleaning_snapshot_by_cf) * amnonh/delete_table_snapshot: test/boost/database_test: adopt new clear_snapshot signature api/storage_service: Support specifying a table when deleting a snapshot storage_service: Add optional table name to clear snapshot * amnonh/delete_table_snapshot: test/boost/database_test: adopt new clear_snapshot signature api/storage_service: Support specifying a table when deleting a snapshot storage_service: Add optional table name to clear snapshot	2020-02-24 09:38:57 +01:00
Pavel Emelyanov	8435e93549	db: Move unbounded_range_tombstones listening from storage_service Now the database keeps reference on feature service, so we can listen on the feature in it directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-19 14:08:24 +03:00
Amnon Heiman	c3260bad25	storage_service: Add optional table name to clear snapshot There are cases when it is useful to delete specific table from a snapshot. An example is when a snapshot is used for backup. Backup can take a long period of time, during that time, each of the tables can be deleted once it was backup without waiting for the entire backup process to completed. This patch adds such an option to the database and to the storage_service wrapping method that calls it. If a table is specified a filter function is created that filter only the column family with that given name. This is similar to the filtering at the keyspace level. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-02-18 16:34:10 +02:00
Avi Kivity	6c7aa18238	Merge "Introduce schema::get_partitioner" from Piotr " Introduce schema::get_partitioner and use it instead of dht::global_partitioner. Fixes #5493 Tests: unit(dev, release, debug) " * 'per_table_partitioner_prep' of https://github.com/haaawk/scylla: (35 commits) cdc: stop using partitioners partitioner_test: stop calling set_global_partitioner storage_service: stop calling global_partitioner() mutation_writer_test: stop calling global_partitioner() schema: reduce number of global_partitioner() calls test_services: stop calling global_partitioner() sstable_utils: stop calling global_partitioner() sstable_resharding_test: stop depending on global partitioner sstable_mutation_test: stop calling global_partitioner() sstable_data_file_test: stop calling global_partitioner() random_schema: stop taking partitioner in constructor mutation_reader_test: stop calling global_partitioner() multishard_mutation_query_test: stop calling global_partitioner() row_level repair: stop calling global_partitioner() distribute_reader_and_consume_on_shards: don't take partitioner thrift: reduce global_partitioner() calls binary_search: stop calling global_partitioner() index_entry: stop calling global_partitioner() mc writer: stop calling global_partitioner() sstable: stop calling global_partitioner() ...	2020-02-17 18:12:53 +02:00
Piotr Dulikowski	01084a79b8	hh: send orphaned hints on HINT_MUTATION verb When replaying a hint with a destination node that is no longer in the cluster, it will be sent with cl=ALL to all its new replicas. Before this patch, the MUTATION verb was used, which causes such hints to be handled on the same connection and with the same priority as regular writes. This can cause problems when a large number of hints is orphaned and they are scheduled to be sent at once. Such situation may happen when replacing a dead node - all nodes that accumulated hints for the dead node will now send them with cl=ALL to their new replicas. This patch changes the verb used to send such hints to HINT_MUTATION. This verb is handled on a separate connection and with streaming scheduling group, which gives them similar priority to non-orphaned hints. Refs: #4712 Tests: unit(dev)	2020-02-17 14:45:22 +01:00
Piotr Jastrzebski	2d7532f87f	dht: add dht::get_token and replace all calls to dht::global_partitioner().get_token dht::get_token is better because it takes schema and uses it to obtain partitioner instead of using a global partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	abd76e566f	dht::shard_of: stop calling global_partitioner() Take const schema& as a parameter of shard_of and use it to obtain partitioner instead of calling global_partitioner(). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:23:16 +01:00
Pavel Emelyanov	1a3f78a57d	database: Use own token_metadata Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	de1dc59548	migration_manager: Refactor validation of new/updating ksm The goal is to have token_metadata reference intide the keyspace_metadata.validate method. This can be acheived by doing the validation through the database reference which is "at hands" in migration_manager. While at it, merge the validation with exists/not-exists checks done in the same places. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 18:10:38 +03:00
Avi Kivity	bed61b96a2	Merge "Move features from storage- into feature-service" from Pavel " There's a lot of code around that needs storage service purely to get the specific feature value (cluster_supports_<something> calls). This creates several circular dependencies, e.g. storage_service <-> migration_manager one and database <-> storage_servuce. Also features sit on storage_service, but register themselfs on the feature_service and the former subscribes on them back which also looks strange. I propose to keep all the features on feature_service, this keeps the latter intependent from other components, makes it possible to break one of the mentioned circle dependencyand heavily relax the other. Also the set helps us fighting the globals and, after it, the feature_service can be safely stopped at the very last moment. Tests: unit(dev), manual debug build start-stop " * 'br-features-to-service-5' of https://github.com/xemul/scylla: gossiper: Avoid string merge-split for nothing features: Stop on shutdown storage_service: Remove helpers storage_service: Prepare to switch from on-board feature helpers cql3: Check feature in .validate database: Use feature service storage_proxy: Use feature service migration_manager: Use feature service start: Pass needed feature as argument into migrate_truncation_records features: Unfriend storage_service features: Simplify feature registration features: Introduce known_feature_set features: Move disabled features set from storage_service features: Move schema_features helper features: Move all features from storage_service to feature_service storage_service: Use feature_config from _feature_service features: Add feature_config storage_service: Kill set_disabled_features gms: Move features stuff into own .cc file migration_manager: Move some fns into class	2020-02-09 19:22:07 +02:00
Calle Wilund	af963e76c7	keyspace/distributed_loader: Add wait for (user) keyspace population to finish Allows caller to check/wait for a given user keyspace to finish populating on boot. Can be called at any time, though if called before population starts, it will wait until it either starts and we can determine that the keyspace does not need populating, or population finishes. tests: unit Message-Id: <20200203151712.10003-1-calle@scylladb.com>	2020-02-09 18:56:22 +02:00
Pavel Emelyanov	d1775dd701	utils: Move disk-error-handler into it The disk-error-handler is purely auxiliary thing that helps propagating IO errors to the rest of the code. It well deserves not sitting in the root namespace. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200207112443.18475-1-xemul@scylladb.com>	2020-02-09 17:26:52 +02:00
Gleb Natapov	20bf3800f3	database: cache truncation time in table objects Truncation time is used on each LWT request now, so reading it from the table is too heave operation to be on a fast path. It also requires jumping to a shard that contains corresponding data. This patch caches the data on the table object of each shard for easy access. The cache is initialized during boot from system.truncated table and updated on each truncation operation. Message-Id: <20200206163838.5220-2-gleb@scylladb.com>	2020-02-06 18:15:48 +01:00
Rafael Ávila de Espíndola	5d4671526c	db: Replace large_data_handler::_stopped with _running This is not just a direct flip to a variable with the negated Boolean value. When created, a large_data_handler is not considered to be running, the user has to call start() before it can be used. The advantaged of doing this is that if initialization fails and a database is destructed before the large_data_handler is started, the assert database::stop() { assert(!_large_data_handler->running()); is not triggered. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-04 21:15:44 -08:00
Pavel Emelyanov	abe588888d	database: Use feature service Keep local feature_service reference on database. This relaxes the circular storage_service <-> database reference, but not removes it completely. This needs some args tossing in apply_to_builder, but it's rather straightforward, so comes in the same patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Botond Dénes	69f606baa0	database: check timout before applying writes Attempting to apply timed-out writes is a wasted effort. The coordinator have already given up on the write and reported it as failed to the client. Any cycles spent on this write is a waste at this point. We currently only check the timeout if the write is blocked on memory, otherwise, if the system is not under pressure, we will happily apply timed out writes. If the system is under pressure we will make it worse by wasting cycles on processing a timed out write. Prevent this by checking the timeout as early as possible in `database::apply()` and `database::apply_counter_update()`. This patch doesn't solve all our problems related to timed out writes. They can still sit and accumulate in various queues without expiring, a prominent example being the smp queues. It is however a good first step towards reducing wasted effort spent on them. Refs: #5055 Ref #5251 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200129093007.550250-1-bdenes@scylladb.com>	2020-01-29 13:08:43 +02:00
Botond Dénes	dfc8b2fc45	treewide: replace reader_resource_tracer with reader_permit The former was never really more than a reader_permit with one additional method. Currently using it doesn't even save one from any includes. Now that readers will be using reader_permit we would have to pass down both to mutation_source. Instead get rid of reader_resource_tracker and just use reader_permit. Instead of making it a last and optional parameter that is easy to ignore, make it a first class parameter, right after schema, to signify that permits are now a prominent part of the reader API. This -- mostly mechanical -- patch essentially refactors mutation_source to ask for the reader_permit instead of reader_resource_tracking and updates all usage sites.	2020-01-28 08:13:16 +02:00
Tomasz Grabiec	36d90e637e	Merge "Relax migration manager dependencies" from Pavel Emalyanov The set make dependencies between mm and other services cleaner, in particular, after the set: - the query processor no longer needs migration manager (which doesn't need query processor either) - the database no longer needs migration manager, thus the mutual dependency between these two is dropped, only migration manager -> database is left - the migration manager -> storage_service dependency is relaxed, one more patchset will be needed to remove it, thus dropping one more mutual dependency between them, only the storage_service -> migration manager will be left - the migration manager is stopped on drain, but several more services need it on stop, thus causing use after free problems, in particular there's a caught bug when view builder crashes when unregistering from notifier list on stop. Fixed. Tests: unit(dev) Fixes: #5404	2020-01-16 12:12:25 +01:00
Pavel Emelyanov	5cf365d7e7	database: Explicitly pass migration_manager through init_non_system_keyspace This is the last place where database code needs the migration_manager instance to be alive, so now the mutual dependency between these two is gone, only the migration_manager needs the database, but not the vice-versa. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	ebebf9f8a8	database: Do not request migration_manager instance for passive_announce The helper in question is static, so no need to play with the migration_manager instances. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	7cfab1de77	database: Switch on mnotifier from migration_manager Do not call for local migration manager instance to send notifications, call for the local migration notifier, it will always be alive. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	e327feb77f	database: Prepare to use on-database migration_notifier Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	f240d5760c	migration_manager: Split notifier from main class The _listeners list on migration_manager class and the corresponding notify_xxx helpers have nothing to do with the its instances, they are just transport for notification delivery. At the same time some services need the migration manager to be alive at their stop time to unregister from it, while the manager itself may need them for its needs. The proposal is to move the migration notifier into a complete separate sharded "service". This service doesn't need anything, so it's started first and stopped last. While it's not effectively a "migration" notifier, we inherited the name from Cassandra and renaming it will "scramble neurons in the old-timers' brains but will make it easier for newcomers" as Avi says. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:19 +03:00
Gleb Natapov	16e0fc4742	schema: allow schema to be marked as 'always sync to commitlog' All writes that uses this schema will be immediately persisted on a storage.	2020-01-15 12:15:42 +02:00
Gleb Natapov	29574c1271	database: pass sync flag from db::apply function to the commitlog Allow upper layers to request a mutation to be persisted on a disk before making future ready independent of which mode commitlog is running in.	2020-01-15 12:15:42 +02:00
Gleb Natapov	e0bc4aa098	commitlog: add sync method to entry_writer If the method returns true commitlog should sync to file immediately after writing the entry and wait for flush to complete before returning.	2020-01-15 12:15:42 +02:00
Nadav Har'El	4aa323154e	merge: Pretty print canonical_mutation objects Merged pull request https://github.com/scylladb/scylla/pull/5533 from Avi Kivity: canonical_mutation objects are used for schema reconciliation, which is a fragile area and thus deserves some debugging help. This series makes canonical_mutation objects printable.	2020-01-14 10:01:06 +02:00
Avi Kivity	454074f284	Merge "database: Avoid OOMing with flush continuations after failed memtable flush" from Tomasz " The original fix (`10f6b125c8`) didn't take into account that if there was a failed memtable flush (Refs flush) but is not a flushable memtable because it's not the latest in the memtable list. If that happens, it means no other memtable is flushable as well, cause otherwise it would be picked due to evictable_occupancy(). Therefore the right action is to not flush anything in this case. Suspected to be observed in #4982. I didn't manage to reproduce after triggering a failed memtable flush. Fixes #3717 " * tag 'avoid-ooming-with-flush-continuations-v2' of github.com:tgrabiec/scylla: database: Avoid OOMing with flush continuations after failed memtable flush lsa: Introduce operator bool() to occupancy_stats lsa: Expose region_impl::evictable_occupancy in the region class	2020-01-08 16:58:54 +02:00
Avi Kivity	19f68412ad	atomic_cell: move pretty printers from database.cc to atomic_cell.cc atomic_cell.cc is the logical home for atomic_cell pretty printers, and since we plan to add more pretty printers, start by tidying up.	2019-12-30 18:20:30 +02:00
Rafael Ávila de Espíndola	3b61cf3f0b	db: Don't use lw_shared_ptr for user_types_metadata The user_types_metadata can simply be owned by the keyspace. This simplifies the code since we never have to worry about nulls and the ownership is now explicit. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	a55838323b	user_types_metadata: don't implement enable_lw_shared_from_this It looks like this was done just to avoid including user_types_metadata.hh, which seems a bit much considering that it requires adding specialization to the seastar namespace. A followup patch will also stop using lw_shared_ptr for user_types_metadata. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Tomasz Grabiec	aa173898d6	Merge "Named semaphores in concurrency reader, segment_manager and region_group" from Juliusz Selected semaphores' names are now included in exception messages in case of timeout or when admission queue overflows. Resolves #5281	2019-12-05 14:19:56 +01:00
Avi Kivity	85822c7786	database: fix schema use-after-move in make_multishard_streaming_reader On aarch64, asan detected a use-after-move. It doesn't happen on x86_64, likely due to different argument evaluation order. Fix by evaluating full_slice before moving the schema. Note: I used "auto&&" and "std::move()" even though full_slice() returns a reference. I think this is safer in case full_slice() changes, and works just as well with a reference. Fixes #5419.	2019-12-05 11:58:34 +02:00
Juliusz Stasiewicz	d043393f52	db+semaphores+tests: mandatory `name' param in reader_concurrency_semaphore Exception messages contain semaphore's name (provided in ctor). This affects the queue overflow exception as well as timeout exception. Also, custom throwing function in ctor was changed to `prethrow_action', i.e. metrics can still be updated there but now callers have no control over the type of the exception being thrown. This affected `restricted_reader_max_queue_length' test. `reader_concurrency_semaphore'-s docs are updated accordingly.	2019-12-03 15:41:34 +01:00
Juliusz Stasiewicz	fa12394dfe	reader_concurrency_semaphore: cosmetic changes Added line breaks, replaced unused include, included seastarx.hh instead of `using namespace seastar`.	2019-11-28 13:39:08 +01:00
Tomasz Grabiec	9d7f8f18ab	database: Avoid OOMing with flush continuations after failed memtable flush The original fix (`10f6b125c8`) didn't take into account that if there was a failed memtable flush (Refs flush) but is not a flushable memtable because it's not the latest in the memtable list. If that happens, it means no other memtable is flushable as well, cause otherwise it would be picked due to evictable_occupancy(). Therefore the right action is to not flush anything in this case. Suspected to be observed in #4982. I didn't manage to reproduce after triggering a failed memtable flush. Fixes #3717	2019-11-22 12:08:36 +01:00

1 2 3 4 5 ...

1271 Commits