scylladb

Author	SHA1	Message	Date
Pavel Emelyanov	8435e93549	db: Move unbounded_range_tombstones listening from storage_service Now the database keeps reference on feature service, so we can listen on the feature in it directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-19 14:08:24 +03:00
Avi Kivity	6c7aa18238	Merge "Introduce schema::get_partitioner" from Piotr " Introduce schema::get_partitioner and use it instead of dht::global_partitioner. Fixes #5493 Tests: unit(dev, release, debug) " * 'per_table_partitioner_prep' of https://github.com/haaawk/scylla: (35 commits) cdc: stop using partitioners partitioner_test: stop calling set_global_partitioner storage_service: stop calling global_partitioner() mutation_writer_test: stop calling global_partitioner() schema: reduce number of global_partitioner() calls test_services: stop calling global_partitioner() sstable_utils: stop calling global_partitioner() sstable_resharding_test: stop depending on global partitioner sstable_mutation_test: stop calling global_partitioner() sstable_data_file_test: stop calling global_partitioner() random_schema: stop taking partitioner in constructor mutation_reader_test: stop calling global_partitioner() multishard_mutation_query_test: stop calling global_partitioner() row_level repair: stop calling global_partitioner() distribute_reader_and_consume_on_shards: don't take partitioner thrift: reduce global_partitioner() calls binary_search: stop calling global_partitioner() index_entry: stop calling global_partitioner() mc writer: stop calling global_partitioner() sstable: stop calling global_partitioner() ...	2020-02-17 18:12:53 +02:00
Piotr Dulikowski	01084a79b8	hh: send orphaned hints on HINT_MUTATION verb When replaying a hint with a destination node that is no longer in the cluster, it will be sent with cl=ALL to all its new replicas. Before this patch, the MUTATION verb was used, which causes such hints to be handled on the same connection and with the same priority as regular writes. This can cause problems when a large number of hints is orphaned and they are scheduled to be sent at once. Such situation may happen when replacing a dead node - all nodes that accumulated hints for the dead node will now send them with cl=ALL to their new replicas. This patch changes the verb used to send such hints to HINT_MUTATION. This verb is handled on a separate connection and with streaming scheduling group, which gives them similar priority to non-orphaned hints. Refs: #4712 Tests: unit(dev)	2020-02-17 14:45:22 +01:00
Piotr Jastrzebski	2d7532f87f	dht: add dht::get_token and replace all calls to dht::global_partitioner().get_token dht::get_token is better because it takes schema and uses it to obtain partitioner instead of using a global partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	abd76e566f	dht::shard_of: stop calling global_partitioner() Take const schema& as a parameter of shard_of and use it to obtain partitioner instead of calling global_partitioner(). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:23:16 +01:00
Pavel Emelyanov	1a3f78a57d	database: Use own token_metadata Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 20:54:32 +03:00
Pavel Emelyanov	de1dc59548	migration_manager: Refactor validation of new/updating ksm The goal is to have token_metadata reference intide the keyspace_metadata.validate method. This can be acheived by doing the validation through the database reference which is "at hands" in migration_manager. While at it, merge the validation with exists/not-exists checks done in the same places. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-10 18:10:38 +03:00
Avi Kivity	bed61b96a2	Merge "Move features from storage- into feature-service" from Pavel " There's a lot of code around that needs storage service purely to get the specific feature value (cluster_supports_<something> calls). This creates several circular dependencies, e.g. storage_service <-> migration_manager one and database <-> storage_servuce. Also features sit on storage_service, but register themselfs on the feature_service and the former subscribes on them back which also looks strange. I propose to keep all the features on feature_service, this keeps the latter intependent from other components, makes it possible to break one of the mentioned circle dependencyand heavily relax the other. Also the set helps us fighting the globals and, after it, the feature_service can be safely stopped at the very last moment. Tests: unit(dev), manual debug build start-stop " * 'br-features-to-service-5' of https://github.com/xemul/scylla: gossiper: Avoid string merge-split for nothing features: Stop on shutdown storage_service: Remove helpers storage_service: Prepare to switch from on-board feature helpers cql3: Check feature in .validate database: Use feature service storage_proxy: Use feature service migration_manager: Use feature service start: Pass needed feature as argument into migrate_truncation_records features: Unfriend storage_service features: Simplify feature registration features: Introduce known_feature_set features: Move disabled features set from storage_service features: Move schema_features helper features: Move all features from storage_service to feature_service storage_service: Use feature_config from _feature_service features: Add feature_config storage_service: Kill set_disabled_features gms: Move features stuff into own .cc file migration_manager: Move some fns into class	2020-02-09 19:22:07 +02:00
Calle Wilund	af963e76c7	keyspace/distributed_loader: Add wait for (user) keyspace population to finish Allows caller to check/wait for a given user keyspace to finish populating on boot. Can be called at any time, though if called before population starts, it will wait until it either starts and we can determine that the keyspace does not need populating, or population finishes. tests: unit Message-Id: <20200203151712.10003-1-calle@scylladb.com>	2020-02-09 18:56:22 +02:00
Pavel Emelyanov	d1775dd701	utils: Move disk-error-handler into it The disk-error-handler is purely auxiliary thing that helps propagating IO errors to the rest of the code. It well deserves not sitting in the root namespace. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200207112443.18475-1-xemul@scylladb.com>	2020-02-09 17:26:52 +02:00
Gleb Natapov	20bf3800f3	database: cache truncation time in table objects Truncation time is used on each LWT request now, so reading it from the table is too heave operation to be on a fast path. It also requires jumping to a shard that contains corresponding data. This patch caches the data on the table object of each shard for easy access. The cache is initialized during boot from system.truncated table and updated on each truncation operation. Message-Id: <20200206163838.5220-2-gleb@scylladb.com>	2020-02-06 18:15:48 +01:00
Rafael Ávila de Espíndola	5d4671526c	db: Replace large_data_handler::_stopped with _running This is not just a direct flip to a variable with the negated Boolean value. When created, a large_data_handler is not considered to be running, the user has to call start() before it can be used. The advantaged of doing this is that if initialization fails and a database is destructed before the large_data_handler is started, the assert database::stop() { assert(!_large_data_handler->running()); is not triggered. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-04 21:15:44 -08:00
Pavel Emelyanov	abe588888d	database: Use feature service Keep local feature_service reference on database. This relaxes the circular storage_service <-> database reference, but not removes it completely. This needs some args tossing in apply_to_builder, but it's rather straightforward, so comes in the same patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-02-03 15:16:23 +03:00
Botond Dénes	69f606baa0	database: check timout before applying writes Attempting to apply timed-out writes is a wasted effort. The coordinator have already given up on the write and reported it as failed to the client. Any cycles spent on this write is a waste at this point. We currently only check the timeout if the write is blocked on memory, otherwise, if the system is not under pressure, we will happily apply timed out writes. If the system is under pressure we will make it worse by wasting cycles on processing a timed out write. Prevent this by checking the timeout as early as possible in `database::apply()` and `database::apply_counter_update()`. This patch doesn't solve all our problems related to timed out writes. They can still sit and accumulate in various queues without expiring, a prominent example being the smp queues. It is however a good first step towards reducing wasted effort spent on them. Refs: #5055 Ref #5251 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200129093007.550250-1-bdenes@scylladb.com>	2020-01-29 13:08:43 +02:00
Botond Dénes	dfc8b2fc45	treewide: replace reader_resource_tracer with reader_permit The former was never really more than a reader_permit with one additional method. Currently using it doesn't even save one from any includes. Now that readers will be using reader_permit we would have to pass down both to mutation_source. Instead get rid of reader_resource_tracker and just use reader_permit. Instead of making it a last and optional parameter that is easy to ignore, make it a first class parameter, right after schema, to signify that permits are now a prominent part of the reader API. This -- mostly mechanical -- patch essentially refactors mutation_source to ask for the reader_permit instead of reader_resource_tracking and updates all usage sites.	2020-01-28 08:13:16 +02:00
Tomasz Grabiec	36d90e637e	Merge "Relax migration manager dependencies" from Pavel Emalyanov The set make dependencies between mm and other services cleaner, in particular, after the set: - the query processor no longer needs migration manager (which doesn't need query processor either) - the database no longer needs migration manager, thus the mutual dependency between these two is dropped, only migration manager -> database is left - the migration manager -> storage_service dependency is relaxed, one more patchset will be needed to remove it, thus dropping one more mutual dependency between them, only the storage_service -> migration manager will be left - the migration manager is stopped on drain, but several more services need it on stop, thus causing use after free problems, in particular there's a caught bug when view builder crashes when unregistering from notifier list on stop. Fixed. Tests: unit(dev) Fixes: #5404	2020-01-16 12:12:25 +01:00
Pavel Emelyanov	5cf365d7e7	database: Explicitly pass migration_manager through init_non_system_keyspace This is the last place where database code needs the migration_manager instance to be alive, so now the mutual dependency between these two is gone, only the migration_manager needs the database, but not the vice-versa. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	ebebf9f8a8	database: Do not request migration_manager instance for passive_announce The helper in question is static, so no need to play with the migration_manager instances. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	7cfab1de77	database: Switch on mnotifier from migration_manager Do not call for local migration manager instance to send notifications, call for the local migration notifier, it will always be alive. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	e327feb77f	database: Prepare to use on-database migration_notifier Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	f240d5760c	migration_manager: Split notifier from main class The _listeners list on migration_manager class and the corresponding notify_xxx helpers have nothing to do with the its instances, they are just transport for notification delivery. At the same time some services need the migration manager to be alive at their stop time to unregister from it, while the manager itself may need them for its needs. The proposal is to move the migration notifier into a complete separate sharded "service". This service doesn't need anything, so it's started first and stopped last. While it's not effectively a "migration" notifier, we inherited the name from Cassandra and renaming it will "scramble neurons in the old-timers' brains but will make it easier for newcomers" as Avi says. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:19 +03:00
Gleb Natapov	16e0fc4742	schema: allow schema to be marked as 'always sync to commitlog' All writes that uses this schema will be immediately persisted on a storage.	2020-01-15 12:15:42 +02:00
Gleb Natapov	29574c1271	database: pass sync flag from db::apply function to the commitlog Allow upper layers to request a mutation to be persisted on a disk before making future ready independent of which mode commitlog is running in.	2020-01-15 12:15:42 +02:00
Gleb Natapov	e0bc4aa098	commitlog: add sync method to entry_writer If the method returns true commitlog should sync to file immediately after writing the entry and wait for flush to complete before returning.	2020-01-15 12:15:42 +02:00
Nadav Har'El	4aa323154e	merge: Pretty print canonical_mutation objects Merged pull request https://github.com/scylladb/scylla/pull/5533 from Avi Kivity: canonical_mutation objects are used for schema reconciliation, which is a fragile area and thus deserves some debugging help. This series makes canonical_mutation objects printable.	2020-01-14 10:01:06 +02:00
Avi Kivity	454074f284	Merge "database: Avoid OOMing with flush continuations after failed memtable flush" from Tomasz " The original fix (`10f6b125c8`) didn't take into account that if there was a failed memtable flush (Refs flush) but is not a flushable memtable because it's not the latest in the memtable list. If that happens, it means no other memtable is flushable as well, cause otherwise it would be picked due to evictable_occupancy(). Therefore the right action is to not flush anything in this case. Suspected to be observed in #4982. I didn't manage to reproduce after triggering a failed memtable flush. Fixes #3717 " * tag 'avoid-ooming-with-flush-continuations-v2' of github.com:tgrabiec/scylla: database: Avoid OOMing with flush continuations after failed memtable flush lsa: Introduce operator bool() to occupancy_stats lsa: Expose region_impl::evictable_occupancy in the region class	2020-01-08 16:58:54 +02:00
Avi Kivity	19f68412ad	atomic_cell: move pretty printers from database.cc to atomic_cell.cc atomic_cell.cc is the logical home for atomic_cell pretty printers, and since we plan to add more pretty printers, start by tidying up.	2019-12-30 18:20:30 +02:00
Rafael Ávila de Espíndola	3b61cf3f0b	db: Don't use lw_shared_ptr for user_types_metadata The user_types_metadata can simply be owned by the keyspace. This simplifies the code since we never have to worry about nulls and the ownership is now explicit. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	a55838323b	user_types_metadata: don't implement enable_lw_shared_from_this It looks like this was done just to avoid including user_types_metadata.hh, which seems a bit much considering that it requires adding specialization to the seastar namespace. A followup patch will also stop using lw_shared_ptr for user_types_metadata. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Tomasz Grabiec	aa173898d6	Merge "Named semaphores in concurrency reader, segment_manager and region_group" from Juliusz Selected semaphores' names are now included in exception messages in case of timeout or when admission queue overflows. Resolves #5281	2019-12-05 14:19:56 +01:00
Avi Kivity	85822c7786	database: fix schema use-after-move in make_multishard_streaming_reader On aarch64, asan detected a use-after-move. It doesn't happen on x86_64, likely due to different argument evaluation order. Fix by evaluating full_slice before moving the schema. Note: I used "auto&&" and "std::move()" even though full_slice() returns a reference. I think this is safer in case full_slice() changes, and works just as well with a reference. Fixes #5419.	2019-12-05 11:58:34 +02:00
Juliusz Stasiewicz	d043393f52	db+semaphores+tests: mandatory `name' param in reader_concurrency_semaphore Exception messages contain semaphore's name (provided in ctor). This affects the queue overflow exception as well as timeout exception. Also, custom throwing function in ctor was changed to `prethrow_action', i.e. metrics can still be updated there but now callers have no control over the type of the exception being thrown. This affected `restricted_reader_max_queue_length' test. `reader_concurrency_semaphore'-s docs are updated accordingly.	2019-12-03 15:41:34 +01:00
Juliusz Stasiewicz	fa12394dfe	reader_concurrency_semaphore: cosmetic changes Added line breaks, replaced unused include, included seastarx.hh instead of `using namespace seastar`.	2019-11-28 13:39:08 +01:00
Tomasz Grabiec	9d7f8f18ab	database: Avoid OOMing with flush continuations after failed memtable flush The original fix (`10f6b125c8`) didn't take into account that if there was a failed memtable flush (Refs flush) but is not a flushable memtable because it's not the latest in the memtable list. If that happens, it means no other memtable is flushable as well, cause otherwise it would be picked due to evictable_occupancy(). Therefore the right action is to not flush anything in this case. Suspected to be observed in #4982. I didn't manage to reproduce after triggering a failed memtable flush. Fixes #3717	2019-11-22 12:08:36 +01:00
Avi Kivity	1fe062aed4	Merge "Add basic UDF support" from Rafael " This patch series adds only UDF support, UDA will be in the next patch series. With this all CQL types are mapped to Lua. Right now we setup a new lua state and copy the values for each argument and return. This will be optimized once profiled. We require --experimental to enable UDF in case there is some change to the table format. " * 'espindola/udf-only-v4' of https://github.com/espindola/scylla: (65 commits) Lua: Document the conversions between Lua and CQL Lua: Implement decimal subtraction Lua: Implement decimal addition Lua: Implement support for returning decimal Lua: Implement decimal to string conversion Lua: Implement decimal to floating point conversion Lua: Implement support for decimal arguments Lua: Implement support for returning varint Lua: Implement support for returning duration Lua: Implement support for duration arguments Lua: Implement support for returning inet Lua: Implement support for inet arguments Lua: Implement support for returning time Lua: Implement support for time arguments Lua: Implement support for returning timeuuid Lua: Implement support for returning uuid Lua: Implement support for uuid and timeuuid arguments Lua: Implement support for returning date Lua: Implement support for date arguments Lua: Implement support for returning timestamp ...	2019-11-17 16:38:19 +02:00
Piotr Dulikowski	59fbbb993f	memtables: add partition/row hit/miss counters Adds per-table metrics for counting partition and row reuse in memtables. New metrics are as follows: - memtable_partition_writes - number of write operations performed on partitions in memtables, - memtable_partition_hits - number of write operations performed on partitions that previously existed in a memtable, - memtable_row_writes - number of row write operations performed in memtables, - memtable_row_hits - number of row write operations that ovewrote rows previously present in a memtable. Tests: unit(release)	2019-11-12 13:35:41 +01:00
Rafael Ávila de Espíndola	fc72a64c67	Add schema propagation and storage for UDF With this it is possible to create user defined functions and aggregates and they are saved to disk and the schema change is propagated. It is just not possible to call them yet. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Kamil Braun	c90ea1056b	Remove mutation_partition_applier. It had been replaced by partition_builder in commit `dc290f0af7`.	2019-10-25 10:19:45 +02:00
Amnon Heiman	64c2d28a7f	database: Add counter for the number of schema changes Schema changes can have big effects on performance, typically it should be a rare event. It is usefull to monitor how frequently the schema changed. This patch adds a counter that increases each time a schema changed. After this patch the metrics would look like: scylla_database_schema_changed{shard="0",type="derive"} 2 Fixes #4785 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-10-08 17:54:49 +02:00
Glauber Costa	c9f2d1d105	do not crash in user-defined operations if the controller is disabled Scylla currently crashes if we run manual operations like nodetool compact with the controller disabled. While we neither like nor recommend running with the controller disabled, due to some corner cases in the controller algorithm we are not yet at the point in which we can deprecate this and are sometimes forced to disable it. The reason for the crash is that manual operations will invoke _backlog_of_shares, which returns what is the backlog needed to create a certain number of shares. That scan the existing control points, but when we run without the controller there are no control points and we crash. Backlog doesn't matter if the controller is disabled, and the return value of this function will be immaterial in this case. So to avoid the crash, we return something right away if the controller is disabled. Fixes #5016 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-09-16 18:26:57 +02:00
Botond Dénes	fddd9a88dd	treewide: silence discarded future warnings for legit discards This patch silences those future discard warnings where it is clear that discarding the future was actually the intent of the original author, and they did the necessary precautions (handling errors). The patch also adds some trivial error handling (logging the error) in some places, which were lacking this, but otherwise look ok. No functional changes.	2019-08-26 18:54:44 +03:00
Piotr Sarna	17c323c096	database: add fixing previous secondary index schemas If a schema was created before computed columns were implemented, its token column may not have been marked as computed. To remedy this, if no computed column is found, the schema will be recreated. The code will work correctly even without this patch in order to support upgrading from legacy versions, but it's still important: it transforms token columns from the legacy format to new computed format, which will eventually (after a few release cycles) allow dropping the support for legacy format altogether.	2019-07-19 11:58:42 +02:00
Tomasz Grabiec	7604980d63	database: Add missing partition slicing on streaming reader recreation streaming_reader_lifecycle_policy::create_reader() was ignoring the partition_slice passed to it and always creating the reader for the full slice. That's wrong because create_reader() is called when recreating a reader after it's evicted. If the reader stopped in the middle of partition we need to start from that point. Otherwise, fragments in the mutation stream will appear duplicated or out of ordre, violating assumptions of the consumers. This was observed to result in repair writing incorrect sstables with duplicated clustering rows, which results in malformed_sstable_exception on read from those sstables. Fixes #4659. In v2: - Added an overload without partition_slice to avoid changing existing users which never slice Tests: - unit (dev) - manual (3 node ccm + repair) Backport: 3.1 Reviewd-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1563451506-8871-1-git-send-email-tgrabiec@scylladb.com>	2019-07-18 18:35:28 +03:00
Kamil Braun	d6736a304a	Add metric for failed memtable flushes Resolves #3316. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-10 11:30:10 +03:00
Avi Kivity	fca1ae69ff	database: convert _cfg from a pointer to a reference _cfg cannot be null, so it can be converted to a reference to indicate this. Follow-up to `fe59997efe`.	2019-07-02 17:57:50 +02:00
Avi Kivity	2abe015150	database: allow live update of the compaction_enforce_min_threshold config item Change the type from bool to updateable_value<bool> throughout the dependency chain and mark it as live updateable. In theory we should also observe the value and trigger compaction if it changes, but I don't think it is worthwhile.	2019-06-28 16:43:25 +03:00
Avi Kivity	fe59997efe	database: don't copy config object Copying the config object breaks the link between the original and the copied object, so updates to config items will not be visible. To allow updates, don't copy any more, and instead keep a pointer. The pointer won't work will once config is updateable, since the same object is shared across multiple shard, but that can be addressed later.	2019-06-28 15:20:39 +03:00
Avi Kivity	339699b627	database: remove default constructor Currently, database::_cfg is a copy of the global configuration. But this means that we have multiple master copies of the configuration, which makes updating the configuration harder. In order to eliminate the copy we have to eliminate the database default constructor, which creates a config object, so that all remaining constructors can receive config by reference and retain that reference.	2019-06-28 15:20:39 +03:00
Juliana Oliveira	fd83f61556	Add a warning for partitions with too many rows This patch adds a warning option to the user for situations where rows count may get bigger than initially designed. Through the warning, users can be aware of possible data modeling problems. The threshold is initially set to '100,000'. Tests: unit (dev) Message-Id: <20190528075612.GA24671@shenzou.localdomain>	2019-06-06 19:48:57 +03:00
Avi Kivity	96a0073929	database: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 09:26:58 +03:00

1 2 3 4 5 ...

1255 Commits