scylladb

Author	SHA1	Message	Date
Tomasz Grabiec	36d90e637e	Merge "Relax migration manager dependencies" from Pavel Emalyanov The set make dependencies between mm and other services cleaner, in particular, after the set: - the query processor no longer needs migration manager (which doesn't need query processor either) - the database no longer needs migration manager, thus the mutual dependency between these two is dropped, only migration manager -> database is left - the migration manager -> storage_service dependency is relaxed, one more patchset will be needed to remove it, thus dropping one more mutual dependency between them, only the storage_service -> migration manager will be left - the migration manager is stopped on drain, but several more services need it on stop, thus causing use after free problems, in particular there's a caught bug when view builder crashes when unregistering from notifier list on stop. Fixed. Tests: unit(dev) Fixes: #5404	2020-01-16 12:12:25 +01:00
Pavel Emelyanov	5cf365d7e7	database: Explicitly pass migration_manager through init_non_system_keyspace This is the last place where database code needs the migration_manager instance to be alive, so now the mutual dependency between these two is gone, only the migration_manager needs the database, but not the vice-versa. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	ebebf9f8a8	database: Do not request migration_manager instance for passive_announce The helper in question is static, so no need to play with the migration_manager instances. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:29:21 +03:00
Pavel Emelyanov	7cfab1de77	database: Switch on mnotifier from migration_manager Do not call for local migration manager instance to send notifications, call for the local migration notifier, it will always be alive. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	e327feb77f	database: Prepare to use on-database migration_notifier Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:21 +03:00
Pavel Emelyanov	f240d5760c	migration_manager: Split notifier from main class The _listeners list on migration_manager class and the corresponding notify_xxx helpers have nothing to do with the its instances, they are just transport for notification delivery. At the same time some services need the migration manager to be alive at their stop time to unregister from it, while the manager itself may need them for its needs. The proposal is to move the migration notifier into a complete separate sharded "service". This service doesn't need anything, so it's started first and stopped last. While it's not effectively a "migration" notifier, we inherited the name from Cassandra and renaming it will "scramble neurons in the old-timers' brains but will make it easier for newcomers" as Avi says. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-01-15 14:28:19 +03:00
Gleb Natapov	16e0fc4742	schema: allow schema to be marked as 'always sync to commitlog' All writes that uses this schema will be immediately persisted on a storage.	2020-01-15 12:15:42 +02:00
Gleb Natapov	29574c1271	database: pass sync flag from db::apply function to the commitlog Allow upper layers to request a mutation to be persisted on a disk before making future ready independent of which mode commitlog is running in.	2020-01-15 12:15:42 +02:00
Gleb Natapov	e0bc4aa098	commitlog: add sync method to entry_writer If the method returns true commitlog should sync to file immediately after writing the entry and wait for flush to complete before returning.	2020-01-15 12:15:42 +02:00
Nadav Har'El	4aa323154e	merge: Pretty print canonical_mutation objects Merged pull request https://github.com/scylladb/scylla/pull/5533 from Avi Kivity: canonical_mutation objects are used for schema reconciliation, which is a fragile area and thus deserves some debugging help. This series makes canonical_mutation objects printable.	2020-01-14 10:01:06 +02:00
Avi Kivity	454074f284	Merge "database: Avoid OOMing with flush continuations after failed memtable flush" from Tomasz " The original fix (`10f6b125c8`) didn't take into account that if there was a failed memtable flush (Refs flush) but is not a flushable memtable because it's not the latest in the memtable list. If that happens, it means no other memtable is flushable as well, cause otherwise it would be picked due to evictable_occupancy(). Therefore the right action is to not flush anything in this case. Suspected to be observed in #4982. I didn't manage to reproduce after triggering a failed memtable flush. Fixes #3717 " * tag 'avoid-ooming-with-flush-continuations-v2' of github.com:tgrabiec/scylla: database: Avoid OOMing with flush continuations after failed memtable flush lsa: Introduce operator bool() to occupancy_stats lsa: Expose region_impl::evictable_occupancy in the region class	2020-01-08 16:58:54 +02:00
Avi Kivity	19f68412ad	atomic_cell: move pretty printers from database.cc to atomic_cell.cc atomic_cell.cc is the logical home for atomic_cell pretty printers, and since we plan to add more pretty printers, start by tidying up.	2019-12-30 18:20:30 +02:00
Rafael Ávila de Espíndola	3b61cf3f0b	db: Don't use lw_shared_ptr for user_types_metadata The user_types_metadata can simply be owned by the keyspace. This simplifies the code since we never have to worry about nulls and the ownership is now explicit. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	a55838323b	user_types_metadata: don't implement enable_lw_shared_from_this It looks like this was done just to avoid including user_types_metadata.hh, which seems a bit much considering that it requires adding specialization to the seastar namespace. A followup patch will also stop using lw_shared_ptr for user_types_metadata. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Tomasz Grabiec	aa173898d6	Merge "Named semaphores in concurrency reader, segment_manager and region_group" from Juliusz Selected semaphores' names are now included in exception messages in case of timeout or when admission queue overflows. Resolves #5281	2019-12-05 14:19:56 +01:00
Avi Kivity	85822c7786	database: fix schema use-after-move in make_multishard_streaming_reader On aarch64, asan detected a use-after-move. It doesn't happen on x86_64, likely due to different argument evaluation order. Fix by evaluating full_slice before moving the schema. Note: I used "auto&&" and "std::move()" even though full_slice() returns a reference. I think this is safer in case full_slice() changes, and works just as well with a reference. Fixes #5419.	2019-12-05 11:58:34 +02:00
Juliusz Stasiewicz	d043393f52	db+semaphores+tests: mandatory `name' param in reader_concurrency_semaphore Exception messages contain semaphore's name (provided in ctor). This affects the queue overflow exception as well as timeout exception. Also, custom throwing function in ctor was changed to `prethrow_action', i.e. metrics can still be updated there but now callers have no control over the type of the exception being thrown. This affected `restricted_reader_max_queue_length' test. `reader_concurrency_semaphore'-s docs are updated accordingly.	2019-12-03 15:41:34 +01:00
Juliusz Stasiewicz	fa12394dfe	reader_concurrency_semaphore: cosmetic changes Added line breaks, replaced unused include, included seastarx.hh instead of `using namespace seastar`.	2019-11-28 13:39:08 +01:00
Tomasz Grabiec	9d7f8f18ab	database: Avoid OOMing with flush continuations after failed memtable flush The original fix (`10f6b125c8`) didn't take into account that if there was a failed memtable flush (Refs flush) but is not a flushable memtable because it's not the latest in the memtable list. If that happens, it means no other memtable is flushable as well, cause otherwise it would be picked due to evictable_occupancy(). Therefore the right action is to not flush anything in this case. Suspected to be observed in #4982. I didn't manage to reproduce after triggering a failed memtable flush. Fixes #3717	2019-11-22 12:08:36 +01:00
Avi Kivity	1fe062aed4	Merge "Add basic UDF support" from Rafael " This patch series adds only UDF support, UDA will be in the next patch series. With this all CQL types are mapped to Lua. Right now we setup a new lua state and copy the values for each argument and return. This will be optimized once profiled. We require --experimental to enable UDF in case there is some change to the table format. " * 'espindola/udf-only-v4' of https://github.com/espindola/scylla: (65 commits) Lua: Document the conversions between Lua and CQL Lua: Implement decimal subtraction Lua: Implement decimal addition Lua: Implement support for returning decimal Lua: Implement decimal to string conversion Lua: Implement decimal to floating point conversion Lua: Implement support for decimal arguments Lua: Implement support for returning varint Lua: Implement support for returning duration Lua: Implement support for duration arguments Lua: Implement support for returning inet Lua: Implement support for inet arguments Lua: Implement support for returning time Lua: Implement support for time arguments Lua: Implement support for returning timeuuid Lua: Implement support for returning uuid Lua: Implement support for uuid and timeuuid arguments Lua: Implement support for returning date Lua: Implement support for date arguments Lua: Implement support for returning timestamp ...	2019-11-17 16:38:19 +02:00
Piotr Dulikowski	59fbbb993f	memtables: add partition/row hit/miss counters Adds per-table metrics for counting partition and row reuse in memtables. New metrics are as follows: - memtable_partition_writes - number of write operations performed on partitions in memtables, - memtable_partition_hits - number of write operations performed on partitions that previously existed in a memtable, - memtable_row_writes - number of row write operations performed in memtables, - memtable_row_hits - number of row write operations that ovewrote rows previously present in a memtable. Tests: unit(release)	2019-11-12 13:35:41 +01:00
Rafael Ávila de Espíndola	fc72a64c67	Add schema propagation and storage for UDF With this it is possible to create user defined functions and aggregates and they are saved to disk and the schema change is propagated. It is just not possible to call them yet. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-11-07 08:41:08 -08:00
Kamil Braun	c90ea1056b	Remove mutation_partition_applier. It had been replaced by partition_builder in commit `dc290f0af7`.	2019-10-25 10:19:45 +02:00
Amnon Heiman	64c2d28a7f	database: Add counter for the number of schema changes Schema changes can have big effects on performance, typically it should be a rare event. It is usefull to monitor how frequently the schema changed. This patch adds a counter that increases each time a schema changed. After this patch the metrics would look like: scylla_database_schema_changed{shard="0",type="derive"} 2 Fixes #4785 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-10-08 17:54:49 +02:00
Glauber Costa	c9f2d1d105	do not crash in user-defined operations if the controller is disabled Scylla currently crashes if we run manual operations like nodetool compact with the controller disabled. While we neither like nor recommend running with the controller disabled, due to some corner cases in the controller algorithm we are not yet at the point in which we can deprecate this and are sometimes forced to disable it. The reason for the crash is that manual operations will invoke _backlog_of_shares, which returns what is the backlog needed to create a certain number of shares. That scan the existing control points, but when we run without the controller there are no control points and we crash. Backlog doesn't matter if the controller is disabled, and the return value of this function will be immaterial in this case. So to avoid the crash, we return something right away if the controller is disabled. Fixes #5016 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-09-16 18:26:57 +02:00
Botond Dénes	fddd9a88dd	treewide: silence discarded future warnings for legit discards This patch silences those future discard warnings where it is clear that discarding the future was actually the intent of the original author, and they did the necessary precautions (handling errors). The patch also adds some trivial error handling (logging the error) in some places, which were lacking this, but otherwise look ok. No functional changes.	2019-08-26 18:54:44 +03:00
Piotr Sarna	17c323c096	database: add fixing previous secondary index schemas If a schema was created before computed columns were implemented, its token column may not have been marked as computed. To remedy this, if no computed column is found, the schema will be recreated. The code will work correctly even without this patch in order to support upgrading from legacy versions, but it's still important: it transforms token columns from the legacy format to new computed format, which will eventually (after a few release cycles) allow dropping the support for legacy format altogether.	2019-07-19 11:58:42 +02:00
Tomasz Grabiec	7604980d63	database: Add missing partition slicing on streaming reader recreation streaming_reader_lifecycle_policy::create_reader() was ignoring the partition_slice passed to it and always creating the reader for the full slice. That's wrong because create_reader() is called when recreating a reader after it's evicted. If the reader stopped in the middle of partition we need to start from that point. Otherwise, fragments in the mutation stream will appear duplicated or out of ordre, violating assumptions of the consumers. This was observed to result in repair writing incorrect sstables with duplicated clustering rows, which results in malformed_sstable_exception on read from those sstables. Fixes #4659. In v2: - Added an overload without partition_slice to avoid changing existing users which never slice Tests: - unit (dev) - manual (3 node ccm + repair) Backport: 3.1 Reviewd-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1563451506-8871-1-git-send-email-tgrabiec@scylladb.com>	2019-07-18 18:35:28 +03:00
Kamil Braun	d6736a304a	Add metric for failed memtable flushes Resolves #3316. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-10 11:30:10 +03:00
Avi Kivity	fca1ae69ff	database: convert _cfg from a pointer to a reference _cfg cannot be null, so it can be converted to a reference to indicate this. Follow-up to `fe59997efe`.	2019-07-02 17:57:50 +02:00
Avi Kivity	2abe015150	database: allow live update of the compaction_enforce_min_threshold config item Change the type from bool to updateable_value<bool> throughout the dependency chain and mark it as live updateable. In theory we should also observe the value and trigger compaction if it changes, but I don't think it is worthwhile.	2019-06-28 16:43:25 +03:00
Avi Kivity	fe59997efe	database: don't copy config object Copying the config object breaks the link between the original and the copied object, so updates to config items will not be visible. To allow updates, don't copy any more, and instead keep a pointer. The pointer won't work will once config is updateable, since the same object is shared across multiple shard, but that can be addressed later.	2019-06-28 15:20:39 +03:00
Avi Kivity	339699b627	database: remove default constructor Currently, database::_cfg is a copy of the global configuration. But this means that we have multiple master copies of the configuration, which makes updating the configuration harder. In order to eliminate the copy we have to eliminate the database default constructor, which creates a config object, so that all remaining constructors can receive config by reference and retain that reference.	2019-06-28 15:20:39 +03:00
Juliana Oliveira	fd83f61556	Add a warning for partitions with too many rows This patch adds a warning option to the user for situations where rows count may get bigger than initially designed. Through the warning, users can be aware of possible data modeling problems. The threshold is initially set to '100,000'. Tests: unit (dev) Message-Id: <20190528075612.GA24671@shenzou.localdomain>	2019-06-06 19:48:57 +03:00
Avi Kivity	96a0073929	database: fix pessimizing moves Remove pessimizing moves, as reported by gcc 9.	2019-05-07 09:26:58 +03:00
Tomasz Grabiec	3cb7b2d72e	treewide: Propagate schema_features to db::schema::all_tables()	2019-04-28 15:50:13 +02:00
Benny Halevy	5a99023d4a	treewide: use lambda for io_check of *touch_directory To prepare for a seastar change that adds an optional file_permissions parameter to touch_directory and recursive_touch_directory. This change messes up the call to io_check since the compiler can't derive the Func&& argument. Therefore, use a lambda function instead to wrap the call to {recursive_,}touch_directory. Ref #4395 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190421085502.24729-1-bhalevy@scylladb.com>	2019-04-21 12:04:39 +03:00
Tomasz Grabiec	5dc3f5ea33	Merge "Properly enable MC format on the cluster" from Piotr 1. All nodes in the cluster have to support MC_SSTABLE_FEATURE 2. When a node observes that whole cluster supports MC_SSTABLE_FEATURE then it should start using MC format. 3. Once all shards start to use MC then a node should broadcast that unbounded range tombstones are now supported by the cluster. 4. Once whole cluster supports unbounded range tombstones we can start accepting them on CQL level. tests: unit(release) Fixes #4205 Fixes #4113 * seastar-dev.git dev/haaawk/enable_mc/v11: system_keyspace: Add scylla_local system_keyspace: add accessors for SCYLLA_LOCAL storage_service: add _sstables_format field feature: add when_enabled callbacks system_keyspace: add storage_service param to setup Add sstable format helper methods Register feature listeners in storage_service Add service::read_sstables_format Use read_sstables_format in main.cc Use _sstables_format to determine current format Add _unbounded_range_tombstones_feature Update supported features on format change	2019-04-16 14:07:05 +02:00
Piotr Jastrzebski	96ad8f7df9	Use _sstables_format to determine current format Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 10:37:12 +02:00
Rafael Ávila de Espíndola	4f1260f3e3	cql_type_parser::raw_builder: Allow building types incrementally Before this patch raw_builder would always start with an empty list of user types. This means that every time a type is added to a keyspace, every type in that keyspace needs to be recreated. With this patch we pass a keyspace_metadata instead of just the keyspace name and can construct new user types on top of previous ones. This will be used in the followup patch, where only new types are created. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-08 14:06:51 -07:00
Duarte Nunes	b2dd8ce065	database: Make exception message more accurate It's the sstable read queue that's overloaded, not the inactive one (which can be considered empty when we can't admit newer reads). Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190328003533.6162-1-duarte@scylladb.com>	2019-04-01 13:53:50 +03:00
Benny Halevy	223e1af521	sstables: provide large_data_handler to constructor And use it for writing the sstable and/or when deleting it. Refs #4198 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:24:19 +02:00
Benny Halevy	eebc3701a5	sstables: introduce sstables_manager The goal of the sstables manager is to track and manage sstables life-cycle. There is a sstable manager instance per database and it is passed to each column-family (and test environment) on construction. All sstables created, loaded, and deleted pass through the sstables manager. The manager will make sure consumers of sstables are in sync so that sstables will not be deleted while in use. Refs #4149 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Piotr Sarna	a7602bd2f1	database: add global view update stats Currently view update metrics are only per-table, but per-table metrics are not always enabled. In order to be able to see the number of generated view updates in all cases, global stats are added. Fixes #4221 Message-Id: <e94c27c530b2d7d262f76d03937e7874d674870a.1552552016.git.sarna@scylladb.com>	2019-03-14 12:04:18 +00:00
Rafael Ávila de Espíndola	63251b66c1	db: Record large cells Fixes #4234. Large cells are now recorded in system.large_cells. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola	54b856e5e4	large_data_handler: propagate a future out of stop() stop() will close a semaphore in a followup patch, so it needs to return a future. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Duarte Nunes	a29ec4be76	Merge 'Update system.large_partitions during shutdown' from Rafael " Currently any large partitions found during shutdown are not recorded. The reason is that the database commit log is already off, so there is nowhere to record it to. One possible solution is to have an independent system database. With that the regular db is shutdown first and writes can continue to the system db. That is a pretty big change. It would also not allow us to record large partitions in any system tables. This patch series instead tries to stop the commit log later. With that any large partitions are recorded to the log and moved to a sstable on the next startup. " * 'espindola/shutdown-order-patches-v7' of https://github.com/espindola/scylla: db: stop the commit log after the tables during shutdown db: stop the compaction manager earlier db: Add a stop_database helper db: Don't record large partitions in system tables	2019-03-06 10:36:38 -03:00
Tomasz Grabiec	889f31fabe	Merge "fix slow truncation under flush pressure" from Glauber Truncating a table is very slow if the system is under pressure. Because in that case we mostly just want to get rid of the existing data, it shouldn't take this long. The problem happens because truncate has to wait for memtable flushes to end, twice. This is regardless of whether or not the table being truncated has any data. 1. The first time is when we call truncate itself: if auto_snapshot is enabled, we will flush the contents of this table first and we are expected to be slow. However, even if auto_snapshot is disabled we will still do it -- which is a bug -- if the table is marked as durable. We should just not flush in this case and it is a silly bug. 1. The second time is when we call cf->stop(). Stopping a table will wait for a flush to finish. At this point, regardless of which path (Durable or non-durable) we took in the previous step we will have no more data in the table. However, calling `flush()` still need to acquire a flush_permit, which means we will wait for whichever memtable is flushing at that very moment to end. If the system is under pressure and a memtable flush will take many seconds, so will truncate. Even if auto_snapshots are enabled, we shouldn't have to flush twice. The first flush should already put is in a state in which the next one is immediate (maybe holding on to the permit, maybe destroying the memtable_list already at that point -> since no other memtables should be created). If auto_snapshots are not enabled, the whole thing should just be instantaneous. This patchset fixes that by removing the flush need when !auto_snapshot, and special casing the flush of an empty table. Fixes #4294 * git@github.com:glommer/scylla.git slowtruncate-v2: database: immediately flush tables with no memtables. truncate: do not flush memtables if auto_snapshot is false.	2019-03-06 13:54:58 +01:00
Rafael Ávila de Espíndola	16ed9a2574	db: stop the commit log after the tables during shutdown This allows for system.large_partitions to be updated if a large partition is found while writing the last sstables. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-05 18:04:51 -08:00
Rafael Ávila de Espíndola	a3e1f14134	db: stop the compaction manager earlier We want to finish all large data logging in stop_system, so stopping the compaction manager should be the first thing stop_system does. The make_ready_future<>() will be removed in a followup patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-05 18:04:51 -08:00

1 2 3 4 5 ...

1240 Commits