scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 17:10:35 +00:00

Author	SHA1	Message	Date
Nadav Har'El	aa1de5a171	merge: Synchronize snapshot and staging sstable deletion using sem Merged pull request https://github.com/scylladb/scylla/pull/5343 from Benny Halevy. Fixes #5340 Hold the sstable_deletion_sem table::move_sstables_from_subdirs to serialize access to the staging directory. It now synchronizes snapshot, compaction deletion of sstables, and view_update_generator moving of sstables from staging. Tests: unit (dev) [expect test_user_function_timestamp_return that fails for me locally, but also on master] snapshot_test.py (dev)	2019-12-17 14:06:02 +02:00
Benny Halevy	4b3243f5b9	table: move_sstables_from_staging_in_thread with _sstable_deletion_sem Hold the _sstable_deletion_sem while moving sstables from the staging directory so not to move them under the feet of table::snapshot. Fixes #5340 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-12-17 12:20:20 +02:00
Rafael Ávila de Espíndola	3b61cf3f0b	db: Don't use lw_shared_ptr for user_types_metadata The user_types_metadata can simply be owned by the keyspace. This simplifies the code since we never have to worry about nulls and the ownership is now explicit. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola	a55838323b	user_types_metadata: don't implement enable_lw_shared_from_this It looks like this was done just to avoid including user_types_metadata.hh, which seems a bit much considering that it requires adding specialization to the seastar namespace. A followup patch will also stop using lw_shared_ptr for user_types_metadata. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-12-11 10:44:40 -08:00
Tomasz Grabiec	87b72dad3e	Merge "treewide: add missing const qualifiers" from Pavel Solodovnikov This patchset adds missing "const" function qualifiers throughout the Scylla code base, which would make code less error-prone. The changeset incorporates Kostja's work regarding const qualifiers in the cql code hierarchy along with a follow-up patch addressing the review comment of the corresponding patch set (the patch subject is "cql: propagate const property through prepared statement tree.").	2019-11-27 10:56:20 +01:00
Piotr Sarna	9c5a5a5ac2	treewide: add names to semaphores By default, semaphore exceptions bring along very little context: either that a semaphore was broken or that it timed out. In order to make debugging easier without introducing significant runtime costs, a notion of named semaphore is added. A named semaphore is simply a semaphore with statically defined name, which is present in its errors, bringing valuable context. A semaphore defined as: auto sem = semaphore(0); will present the following message when it breaks: "Semaphore broken" However, a named semaphore: auto named_sem = named_semaphore(0, named_semaphore_exception_factory{"io_concurrency_sem"}); will present a message with at least some debugging context: "Semaphore broken: io_concurrency_sem" It's not much, but it would really help in pinpointing bugs without having to inspect core dumps. At the same time, it does not incur any costs for normal semaphore operations (except for its creation), but instead only uses more CPU in case an error is actually thrown, which is considered rare and not to be on the hot path. Refs #4999 Tests: unit(dev), manual: hardcoding a failure in view building code	2019-11-26 15:14:21 +02:00
Pavel Solodovnikov	2f442f28af	treewide: add const qualifiers throughout the code base	2019-11-26 02:24:49 +03:00
Benny Halevy	f9e93bba38	sstables: compaction: move cleanup parameter to compaction_descriptor Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20191117165806.3234-1-bhalevy@scylladb.com>	2019-11-18 10:52:20 +01:00
Piotr Dulikowski	59fbbb993f	memtables: add partition/row hit/miss counters Adds per-table metrics for counting partition and row reuse in memtables. New metrics are as follows: - memtable_partition_writes - number of write operations performed on partitions in memtables, - memtable_partition_hits - number of write operations performed on partitions that previously existed in a memtable, - memtable_row_writes - number of row write operations performed in memtables, - memtable_row_hits - number of row write operations that ovewrote rows previously present in a memtable. Tests: unit(release)	2019-11-12 13:35:41 +01:00
Piotr Dulikowski	48f7b2e4fb	table: move out table::stats to table_stats This change was done in order to be able to forward-declare the table::stats structure.	2019-11-12 13:35:41 +01:00
Vladimir Davydov	b75862610e	paxos_state: account paxos round latency This patch adds the following per table stats: cas_prepare_latency cas_propose_latency cas_commit_latency They are equivalent to CasPropose, CasPrepare, CasCommit metrics exposed by Cassandra.	2019-10-29 19:26:18 +03:00
Raphael S. Carvalho	7f1a2156c7	table: Don't account for shared SSTables in compaction backlog tracker We don't want to add shared sstables to table's backlog tracker because: 1) table's backlog tracker has only an influence on regular compaction 2) shared sstables are never regular compacted, they're worked by resharding which has its own backlog tracker. Such sstables belong to more than one shard, meaning that currently they're added to backlog tracker of all shards that own them. But the thing is that such sstables ends up being resharded in shard that may be completely random. So increasing backlog of all shards such sstables belong to, won't lead to faster resharding. Also, table's backlog tracker is supposed to deal only with regular compaction. Accounting for shared sstables in table's tracker may lead to incorrect speed up of regular compactions because the controller is not aware that some relevant part of the backlog is due to pending resharding. The fix is about ignoring sstables that will be resharded and let table's backlog tracker account only for sstables that can be worked on by regular compaction, and rely on resharding controlling itself with its own tracker. NOTE: this doesn't fix the resharding controlling issue completely, as described in #4952. We'll still need to throttle regular compaction on behalf of resharding. So subsequent work may be about: - move resharding to its own priority class, perhaps streaming. - make a resharding's backlog tracker accounts for sstables in all of its pending jobs, not only the ongoing ones (currently limited to 1 by shard). - limit compaction shares when resharding is in progress. THIS only fixes the issue in which controller for regular compaction shouldn't account sstables completely exclusive to resharding. Fixes #5077. Refs #4952. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190924022109.17400-1-raphaelsc@scylladb.com>	2019-10-13 10:14:13 +03:00
Amnon Heiman	64c2d28a7f	database: Add counter for the number of schema changes Schema changes can have big effects on performance, typically it should be a rare event. It is usefull to monitor how frequently the schema changed. This patch adds a counter that increases each time a schema changed. After this patch the metrics would look like: scylla_database_schema_changed{shard="0",type="derive"} 2 Fixes #4785 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-10-08 17:54:49 +02:00
Tomasz Grabiec	79935df959	commitlog: replay: Respect back-pressure from memtable space to prevent OOM Commit log replay was bypassing memtable space back-pressure, and if replay was faster than memtable flush, it could lead to OOM. The fix is to call database::apply_in_memory() instead of table::apply(). The former blocks when memtable space is full. Fixes #4982. Tests: - unit (release) - manual, replay with memtable flush failin and without failing Message-Id: <1568381952-26256-1-git-send-email-tgrabiec@scylladb.com>	2019-09-15 11:51:56 +03:00
Piotr Sarna	1ab07b80b4	database: assign proper io priority for streaming view updates Streamed view updates parasitized on writing io priority, which is reserved for user writes - it's now properly bound to streaming write priority.	2019-08-20 00:24:50 +02:00
Tomasz Grabiec	7604980d63	database: Add missing partition slicing on streaming reader recreation streaming_reader_lifecycle_policy::create_reader() was ignoring the partition_slice passed to it and always creating the reader for the full slice. That's wrong because create_reader() is called when recreating a reader after it's evicted. If the reader stopped in the middle of partition we need to start from that point. Otherwise, fragments in the mutation stream will appear duplicated or out of ordre, violating assumptions of the consumers. This was observed to result in repair writing incorrect sstables with duplicated clustering rows, which results in malformed_sstable_exception on read from those sstables. Fixes #4659. In v2: - Added an overload without partition_slice to avoid changing existing users which never slice Tests: - unit (dev) - manual (3 node ccm + repair) Backport: 3.1 Reviewd-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1563451506-8871-1-git-send-email-tgrabiec@scylladb.com>	2019-07-18 18:35:28 +03:00
Benny Halevy	0e4567c881	table: document _sstables_lock/_sstable_deletion_sem locking order Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-15 19:20:35 +03:00
Benny Halevy	bbbd749f70	table: uninline enable_sstable_write Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-11 12:14:44 +03:00
Kamil Braun	d6736a304a	Add metric for failed memtable flushes Resolves #3316. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-10 11:30:10 +03:00
Avi Kivity	fca1ae69ff	database: convert _cfg from a pointer to a reference _cfg cannot be null, so it can be converted to a reference to indicate this. Follow-up to `fe59997efe`.	2019-07-02 17:57:50 +02:00
Avi Kivity	2abe015150	database: allow live update of the compaction_enforce_min_threshold config item Change the type from bool to updateable_value<bool> throughout the dependency chain and mark it as live updateable. In theory we should also observe the value and trigger compaction if it changes, but I don't think it is worthwhile.	2019-06-28 16:43:25 +03:00
Avi Kivity	fe59997efe	database: don't copy config object Copying the config object breaks the link between the original and the copied object, so updates to config items will not be visible. To allow updates, don't copy any more, and instead keep a pointer. The pointer won't work will once config is updateable, since the same object is shared across multiple shard, but that can be addressed later.	2019-06-28 15:20:39 +03:00
Avi Kivity	339699b627	database: remove default constructor Currently, database::_cfg is a copy of the global configuration. But this means that we have multiple master copies of the configuration, which makes updating the configuration harder. In order to eliminate the copy we have to eliminate the database default constructor, which creates a config object, so that all remaining constructors can receive config by reference and retain that reference.	2019-06-28 15:20:39 +03:00
Piotr Sarna	e77ef849af	database: add flag for infinite bound range deletions Database can only support infinite bound range deletions if sstable mc format is supported. As a first step to implement these checks, an appropriate flag is added to database.	2019-06-24 15:57:47 +03:00
Dejan Mircevski	8dcb35913a	table: Avoid needless allocation of cell lockers All `table` instances currently unconditionally allocate a cell locker for counter cells, though not all need one. Since the lockers occupy quite a bit of memory (as reported in #4441), it's wasteful to allocate them when unneeded. Fixes #4441. Tests: unit (dev, debug) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Message-Id: <20190515190910.87931-1-dejan@scylladb.com>	2019-05-16 11:10:38 +03:00
Tomasz Grabiec	3cb7b2d72e	treewide: Propagate schema_features to db::schema::all_tables()	2019-04-28 15:50:13 +02:00
Benny Halevy	223e1af521	sstables: provide large_data_handler to constructor And use it for writing the sstable and/or when deleting it. Refs #4198 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:24:19 +02:00
Benny Halevy	eebc3701a5	sstables: introduce sstables_manager The goal of the sstables manager is to track and manage sstables life-cycle. There is a sstable manager instance per database and it is passed to each column-family (and test environment) on construction. All sstables created, loaded, and deleted pass through the sstables manager. The manager will make sure consumers of sstables are in sync so that sstables will not be deleted while in use. Refs #4149 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Benny Halevy	3a17053cb8	database: add table::make_sstable helper In most cases we make a sstable based on the table schema and soon - large_data_handler. Encapsulate that in a make_sstable method. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-26 16:05:08 +02:00
Piotr Sarna	a7602bd2f1	database: add global view update stats Currently view update metrics are only per-table, but per-table metrics are not always enabled. In order to be able to see the number of generated view updates in all cases, global stats are added. Fixes #4221 Message-Id: <e94c27c530b2d7d262f76d03937e7874d674870a.1552552016.git.sarna@scylladb.com>	2019-03-14 12:04:18 +00:00
Rafael Ávila de Espíndola	54b856e5e4	large_data_handler: propagate a future out of stop() stop() will close a semaphore in a followup patch, so it needs to return a future. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-12 13:19:04 -07:00
Avi Kivity	0beeb2f721	Merge "implement upgradesstables + scub" from Calle " Fixes #4245 Breaks up "perform_cleanup" in parameterized "rewrite_sstables" and implements upgrade + scrub in terms of this. Both run as a "regular" compaction, but ignore the normal criteria for compaction and select obsolete/all tables. We also ensure all previous compactions are done so we can guarantee all tables are rewritten post invocation of command. " * 'calle/upgrade_sstables' of github.com:scylladb/seastar-dev: api::storage_service: Implement "scrub" api/storage_service: Implement "upgradesstables" api::storage_service: Add keyspace + tables helper compaction_manager: Add perform_sstable_scrub compaction_manager: Add perform_sstable_upgrade compaction_manager: break out rewrite_sstables from cleanup table: parameterize cleanup_sstables	2019-03-06 15:47:26 +02:00
Duarte Nunes	a29ec4be76	Merge 'Update system.large_partitions during shutdown' from Rafael " Currently any large partitions found during shutdown are not recorded. The reason is that the database commit log is already off, so there is nowhere to record it to. One possible solution is to have an independent system database. With that the regular db is shutdown first and writes can continue to the system db. That is a pretty big change. It would also not allow us to record large partitions in any system tables. This patch series instead tries to stop the commit log later. With that any large partitions are recorded to the log and moved to a sstable on the next startup. " * 'espindola/shutdown-order-patches-v7' of https://github.com/espindola/scylla: db: stop the commit log after the tables during shutdown db: stop the compaction manager earlier db: Add a stop_database helper db: Don't record large partitions in system tables	2019-03-06 10:36:38 -03:00
Tomasz Grabiec	889f31fabe	Merge "fix slow truncation under flush pressure" from Glauber Truncating a table is very slow if the system is under pressure. Because in that case we mostly just want to get rid of the existing data, it shouldn't take this long. The problem happens because truncate has to wait for memtable flushes to end, twice. This is regardless of whether or not the table being truncated has any data. 1. The first time is when we call truncate itself: if auto_snapshot is enabled, we will flush the contents of this table first and we are expected to be slow. However, even if auto_snapshot is disabled we will still do it -- which is a bug -- if the table is marked as durable. We should just not flush in this case and it is a silly bug. 1. The second time is when we call cf->stop(). Stopping a table will wait for a flush to finish. At this point, regardless of which path (Durable or non-durable) we took in the previous step we will have no more data in the table. However, calling `flush()` still need to acquire a flush_permit, which means we will wait for whichever memtable is flushing at that very moment to end. If the system is under pressure and a memtable flush will take many seconds, so will truncate. Even if auto_snapshots are enabled, we shouldn't have to flush twice. The first flush should already put is in a state in which the next one is immediate (maybe holding on to the permit, maybe destroying the memtable_list already at that point -> since no other memtables should be created). If auto_snapshots are not enabled, the whole thing should just be instantaneous. This patchset fixes that by removing the flush need when !auto_snapshot, and special casing the flush of an empty table. Fixes #4294 * git@github.com:glommer/scylla.git slowtruncate-v2: database: immediately flush tables with no memtables. truncate: do not flush memtables if auto_snapshot is false.	2019-03-06 13:54:58 +01:00
Rafael Ávila de Espíndola	16ed9a2574	db: stop the commit log after the tables during shutdown This allows for system.large_partitions to be updated if a large partition is found while writing the last sstables. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-05 18:04:51 -08:00
Rafael Ávila de Espíndola	765d8535f1	db: Add a stop_database helper This reduces code duplication. A followup patch will add more code to stop_database. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-03-05 18:04:45 -08:00
Glauber Costa	ed8261a0fe	database: immediately flush tables with no memtables. If a table has no data, it may still take a long time to flush. This is because before we even try to flush, we need go acquire a permit and that can take a while if there is a long running flush already queued. We can special case the situation in which there is no data in any of the memtables owned by table and return immediately. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2019-03-05 11:22:48 -05:00
Piotr Sarna	67e63d4dd7	database: add view_stats getter It will be used for testing purposes	2019-02-28 10:47:20 +01:00
Calle Wilund	7fb6bbe68c	table: parameterize cleanup_sstables To allow using the logic for one-sstable-at-a-time compaction (i.e. rewrite) of sstables without the "normal" cleanup logic and partition selection.	2019-02-27 14:25:31 +00:00
Asias He	75edbe939d	database: Add update_schema_version and announce_schema_version Split the update_schema_version_and_announce() into update_schema_version() and announce_schema_version(). This is going to be used in storage_service::prepare_to_join() where we want to first update the schema version, start gossip, announce the schema version.	2019-02-26 19:10:02 +08:00
Rafael Ávila de Espíndola	9cd14f2602	Don't write to system.large_partition during shutdown The included testcase used to crash because during database::stop() we would try to update system.large_partition. There doesn't seem to be an order we can stop the existing services in cql_test_env that makes this possible. This patch then adds another step when shutting down a database: first stop updating system.large_partition. This means that during shutdown any memtable flush, compaction or sstable deletion will not be reflected in system.large_partition. This is hopefully not too bad since the data in the table is TTLed. This seems to impact only tests, since main.cc calls _exit directly. Tests: unit (release,debug) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190213194851.117692-1-espindola@scylladb.com>	2019-02-15 10:49:10 +01:00
Glauber Costa	e0bfd1c40a	allow Cassandra SSTables with counters to be imported if they are new enough Right now Cassandra SSTables with counters cannot be imported into Scylla. The reason for that is that Cassandra changed their counter representation in their 2.1 version and kept transparently supporting both representations. We do not support their old representation, nor there is a sane way to figure out by looking at the data which one is in use. For safety, we had made the decision long ago to not import any tables with counters: if a counter was generated in older Cassandra, we would misrepresent them. In this patch, I propose we offer a non-default way to import SSTables with counters: we can gate it with a flag, and trust that the user knows what they are doing when flipping it (at their own peril). Cassandra 2.1 is by now pretty old. many users can safely say they've never used anything older. While there are tools like sstableloader that can be used to import those counters, there are often situations in which directly importing SSTables is either better, faster, or worse: the only option left. I argue that having a flag that allow us to import them when we are sure it is safe is better than having no option at all. With this patch I was able to successfully import Cassandra tables with counters that were generated in Cassandra 2.1, reshard and compact their SSTables, and read the data back to get the same values in Scylla as in Cassandra. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190210154028.12472-1-glauber@scylladb.com>	2019-02-10 17:50:48 +02:00
Rafael Ávila de Espíndola	625080b414	Rename large_partition_handler Now that it also handles large rows, rename it to large_data_handler. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-28 15:03:14 -08:00
Duarte Nunes	ea34e242de	Merge 'Do not use hints for view building' from Piotr " This series prevents view building to fall back to storing hints. Instead, it will try to send hints to an endpoint as if it has consistency level ONE, and in case of failure retry the whole building step. Then, view building will never be marked as finished prematurely (because of pending hints), which will help avoid creating inconsistencies when decommissioning a node from the cluster. Tests: unit (release) dtest (materialized_views_test.py.) Fixes #3857 Fixes #4039 " 'do_not_mark_view_as_built_with_hints_7' of https://github.com/psarna/scylla: db,view: add updating view_building_paused statistics database: add view_building_paused metrics table: make populate_views not allow hints db,view: add allow_hints parameter to mutate_MV storage_proxy: add allow_hints parameter to send_to_endpoint	2019-01-28 10:31:14 +00:00
Piotr Sarna	e30b0663d6	database: add view_building_paused metrics The metrics exposes how many times view building process was paused, e.g. because target node was down or overloaded.	2019-01-28 09:38:42 +01:00
Piotr Jastrzebski	7666e81b51	Decouple database.hh from types/user.hh This commit declares shared_ptr<user_types_metadata> in database.hh were user_types_metadata is an incomplete type so it requires "Allow to use shared_ptr with incomplete type other than sstable" to compile correctly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:55:04 +01:00
Piotr Jastrzebski	e92b4c3dbc	Move user_type_impl out of types.hh to types/user.hh Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:04:04 +01:00
Rafael Ávila de Espíndola	f7d1dc16d4	database: Use nop_large_partition_handler to avoid self-reporting Currently nop_large_partition_handler is only used in tests, but it can also be used avoid self-reporting. Tests: unit(Release) I also tested starting scylla with --compaction-large-partition-warning-threshold-mb=0. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190123205059.39573-1-espindola@scylladb.com>	2019-01-23 21:11:21 +00:00
Botond Dénes	4e89dea9ea	database: don't allow access to global semaphores Recently we had a bug (#4096) due to a component (`multishard_mutation_query()`) assuming that all reads used the semaphore obtainable via `database::user_read_concurrency_sem()`. This problem revealed that it is plain wrong to allow access to the shard-global semaphores residing in the database object. Instead all code wishing to access the relevant semaphore for some read, should do so via the relevant `table` object, thus guaranteeing that it will get the correct semaphore, configured for that table. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4f3a6780eb3240822db34aba7c1ba0a675a96592.1547734212.git.bdenes@scylladb.com>	2019-01-21 16:29:02 +02:00
Avi Kivity	fae4c6c0b6	database: merge for_all_partitions and for_all_partitions_slow for_all_partitions is only used in the implementation of for_all_partitions_slow, so merge them and get rid of a template.	2019-01-20 15:55:20 +02:00

1 2 3 4 5 ...

733 Commits