scylladb

Author	SHA1	Message	Date
Piotr Jastrzebski	7666e81b51	Decouple database.hh from types/user.hh This commit declares shared_ptr<user_types_metadata> in database.hh were user_types_metadata is an incomplete type so it requires "Allow to use shared_ptr with incomplete type other than sstable" to compile correctly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-01-24 09:55:04 +01:00
Rafael Ávila de Espíndola	f7d1dc16d4	database: Use nop_large_partition_handler to avoid self-reporting Currently nop_large_partition_handler is only used in tests, but it can also be used avoid self-reporting. Tests: unit(Release) I also tested starting scylla with --compaction-large-partition-warning-threshold-mb=0. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190123205059.39573-1-espindola@scylladb.com>	2019-01-23 21:11:21 +00:00
Benny Halevy	93270dd8e0	gc_clock: make 64 bit Fixes: #3353 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-01-22 15:34:32 +02:00
Botond Dénes	4e89dea9ea	database: don't allow access to global semaphores Recently we had a bug (#4096) due to a component (`multishard_mutation_query()`) assuming that all reads used the semaphore obtainable via `database::user_read_concurrency_sem()`. This problem revealed that it is plain wrong to allow access to the shard-global semaphores residing in the database object. Instead all code wishing to access the relevant semaphore for some read, should do so via the relevant `table` object, thus guaranteeing that it will get the correct semaphore, configured for that table. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4f3a6780eb3240822db34aba7c1ba0a675a96592.1547734212.git.bdenes@scylladb.com>	2019-01-21 16:29:02 +02:00
Avi Kivity	9858395c3e	database: de-template do_parse_schema_tables This long slow-path function is called four times, so de-templating it is an easy win. We use std::function instead of noncopyable_function because the function is copied within the parallel_for_each callback. The original code uses a move, which is incorrect, but did not fail because moving the lambdas that were used as the actual arguments is equivalent to a copy.	2019-01-20 15:55:18 +02:00
Avi Kivity	6e6372e8d2	Revert "Merge "Type-eaese gratuitous templates with functions" from Avi" This reverts commit `31c6a794e9`, reversing changes made to `4537ec7426`. It causes bad_function_calls in some situations: INFO 2019-01-20 01:41:12,164 [shard 0] database - Keyspace system: Reading CF sstable_activity id=5a1ff267-ace0-3f12-8563-cfae6103c65e version=d69820df-9d03-3cd0-91b0-c078c030b708 INFO 2019-01-20 01:41:13,952 [shard 0] legacy_schema_migrator - Moving 0 keyspaces from legacy schema tables to the new schema keyspace (system_schema) INFO 2019-01-20 01:41:13,958 [shard 0] legacy_schema_migrator - Dropping legacy schema tables INFO 2019-01-20 01:41:14,702 [shard 0] legacy_schema_migrator - Completed migration of legacy schema tables ERROR 2019-01-20 01:41:14,999 [shard 0] seastar - Exiting on unhandled exception: std::bad_function_call (bad_function_call)	2019-01-20 11:32:14 +02:00
Avi Kivity	4568a4e4b0	database: de-template do_parse_schema_tables This long slow-path function is called four times, so de-templating it is an easy win.	2019-01-17 18:48:57 +02:00
Asias He	1cc7e45f44	database: Make log max_vector_size and internal_count debug level It is useful for developers but not useful for users. Make it debug level. Message-Id: <775ce22d6f8088a44d35601509622a7e73ddeb9b.1547524976.git.asias@scylladb.com>	2019-01-15 11:02:30 +02:00
Avi Kivity	df090a15ff	Merge "Add counters for inactive reads" from Botond " This mini-series adds counters for the inactive reads registered in the reader concurrency semaphore. " * 'reader-concurrency-semaphore-counters/v6' of https://github.com/denesb/scylla: tests/querier_cache: use stats to get the no. of inactive reads reader_concurrency_semaphore: add counters for inactive reads	2019-01-14 11:56:43 +02:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Piotr Sarna	c5346cdf9b	database, table: split table-related code to table.cc All table:: related code is moved to table.cc source file, which splits database.cc size in half and thus allows faster compilation on multiple cores. Refs #1 Message-Id: <28e67f7793ff2147ffce18df5e0b077e14d3b8bd.1546940360.git.sarna@scylladb.com>	2019-01-08 12:02:42 +02:00
Botond Dénes	e56c26205f	reader_concurrency_semaphore: add counters for inactive reads Add counters that give insight into inactive read related events. Two counters are added: * permit_based_evictions * population	2019-01-07 16:45:49 +02:00
Avi Kivity	c180a18dbb	Distribute distributed_loader into its own header and source files distributed_loader is a sizeable fraction of database.cc, so moving it out reduces compile time and improves readability. Message-Id: <20181230200926.15074-1-avi@scylladb.com>	2018-12-31 14:27:27 +02:00
Tomasz Grabiec	7747f2dde3	Merge "nodetool toppartitions" from Rafi & Avi Implementation of nodetool toppartiotion query, which samples most frequest PKs in read/write operation over a period of time. Content: - data_listener classes: mechanism that interfaces with mutation readers in database and table classes, - toppartition_query and toppartition_data_listener classes to implement toppartition-specific query (this interfaces with data_listeners and the REST api), - REST api for toppartitions query. Uses Top-k structure for handling stream summary statistics (based on implementation in C, see #2811). What's still missing: - JMX interface to nodetool (interface customization may be required), - Querying #rows and #bytes (currently, only #partitions is supported). Fixes #2811 https://github.com/avikivity/scylla rafie_toppartitions_v7.1: top_k: whitespace and minor fixes top_k: map template arguments top_k: std::list -> chunked_vector top_k: support for appending top_k results nodetool toppartitions: refactor table::config constructor nodetool toppartitions: data listeners nodetool toppartitions: add data_listeners to database/table nodetool toppartitions: fully_qualified_cf_name nodetool toppartitions: Toppartitions query implementation nodetool toppartitions: Toppartitions query REST API nodetool toppartitions: nodetool-toppartitions script	2018-12-28 16:31:24 +01:00
Rafi Einstein	0bffe5f83e	nodetool toppartitions: add data_listeners to database/table Add data_listeners member to database. Adds data_listeners* to table::config, to be used by table methods to invoke listeners. Install on_read() listener in table::make_reader(). Install on_write() listener in database::apply_in_memory(). Tests: Unit (release) Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	038f8c7988	nodetool toppartitions: refactor table::config constructor Eliminae extra parameters to ctor and deduce them instead from db param. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Avi Kivity	74c1afad29	database: provide accessor to db::extensions Rather than forcing callers to go through get_config(), provide a direct accessor. This reduces dependencies on config.hh, and will allow separation of extensions from configuration.	2018-12-21 20:15:43 +00:00
Duarte Nunes	8d6718b6e4	database: Add counter for current view backlog Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	fc9176e784	database: Wait on view update semaphore for view building View building sends view updates synchronously, which has natural backpressure. However, they 1) Contribute to the load on the view replicas, and; 2) Add memory pressure to the base replica. They should thus count towards the current view update backlog, and consume units from the view update concurrency semaphore. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	86198060e5	database: generate_and_propagate_view_updates no longer needs a timeout We no longer wait on the semaphore and instead over-subscribe it, so there's not reason to pass a timeout. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	39eda68094	database: Don't generate view updates when node is overloaded We arrive at an overloaded state when we fail to acquire semaphore units in the base replica. This can mean clients are working in interactive mode, we fail to throttle them and consequently should start shedding load. We want to avoid impacting base table availability by running out of memory, so we could offload the memory queue to disk by writing the view updates as hints without attempting to send them. However, the disk is also a limited resource and in extreme cases we won’t be able to write hints. A tension exists between forgetting the view updates, thereby opening up a window for inconsistencies between base and view, or failing the base replica write. The latter can fail the whole user write, or if the coordinator was able to achieve CL, can instead cause inconsistencies between base tables (we wouldn't want to store a hint, because if the base replica is still overloaded, we would redo the whole dance). Between the devil and the deep blue sea, we chose to forget view updates. As a further simplification, we don't even write hints, assuming that if clients can’t be throttled (as we'll attempt to do in future patches), it will only be a matter of time before view updates can’t be offloaded. We also start acquiring the semaphore units using consume(), which is non-blocking, but allows for underflow of the available semaphore units. This is okay, and we expect not to underflow by much, as we stop generating new view updates. Refs #2538 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	a3d30ea99a	db/view: Propagate acquired semaphore units to mutate_MV() Propagate acquired semaphore units to mutate_MV() to allow the semaphore to be incrementally signalled as view updates are processed by view replicas. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	11c02c51fe	database: Wait for pending view updates to drain before stopping Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	185a4594af	database: Restore formatting of table::stop() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	f286d2ec34	database: Wait for pending operations in table::stop() Stopping a table with in-flight reads and writes can be happening concurrently, which rely on table state and we must therefore prevent its destruction before those operations complete. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	1f1fc36b72	database: Make view update concurrency semaphore memory-based The semaphore currently limiting the amount of view updates a given base replica emits aims to control the load that is imposed on the cluster, to protect view replicas from being overloaded when there are bursts of traffic (especially for degenerate cases like an index with low selectivity). 100 is, however, an arbitrary number. It might allow too much load on the view replicas, and it might also allow too much memory from the base shard to be consumed. Conversely, it might allow for too few updates to be queued in case of a burst, or to absorb updates while a view replica becomes partitioned. To deal with the load that is inflicted on the cluster, future patches will ensure that the rate of base writes obeys the rate at which the slowest view replica can consume the corresponding view updates. To protect the current shard from using too much memory for this queue, we will limit it to 10% of the shard's memory. The goal is to both protect the shard from being overloaded, but also to allow it to absorb bursts of writes resulting in large view mutations. Refs #2538 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	2753cfee88	db/view: Generate view updates as frozen_mutations Working in terms of frozen_mutations allows us to account more precisely the memory pending view updates consume at the storage_proxy layer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	7cfcd21bbb	database: Make lambda in table::populate_views mutable This allows an std::move() in its body to work as intended. Also, make the lambda's argument type explicit. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Avi Kivity	475b151c97	Merge "Use utils::small_vector more in read path" from Paweł " This series optimises the read path by replacing some usages of std::vector by utils::small_vector. The motivation for this change was an observation that memory allocation functions are pointed out by the profiler as the ones where we spent most time and while they have a large number of callers storage allocation for some vectors was close to the top. The gains are not huge, since the problem is a lot of things adding up and not a single slow thing, but we need to start with something. Unfortunately, the performance of boost::container::small_vector is quite disappointing so a new implementation of a small_vector was introduced. perf_simple_query -c4 --duration 60, medians: ./perf_before ./perf_after diff read 343086.80 360720.53 5.1% Tests: unit(release, small_vector in debug) " * tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla: partition_slice: use small_vector for column_ids mutation_fragment_merger: use small_vector auth: use small_vector in resource auth: avoid list-initialisation of vectors idl: serialiser: add serialiser for utils::small_vector idl: serialiser: deduplicate vector serialisers utils: introduce small_vector intrusive_set_external_comparator: make iterator nothrow move constructible mutation_fragment_merger: value-initialise iterator	2018-12-10 13:50:59 +02:00
Paweł Dziepak	9024187222	partition_slice: use small_vector for column_ids	2018-12-06 14:21:04 +00:00
Glauber Costa	fee4d2eb9b	compaction_manager: delay initialization of the compaction manager. If the compaction manager is started, compactions may start (this is regardless of whether or not we trigger them). The problem with that is that they start at a time in which we are flushing the commitlog and the initialization procedure waits for the commitlog to be fully flushed and the resulting memtables flushed before we move on. Because there are no incoming writes, the amount of shares in memtable flushes decrease as memory used decreases and that can cause the startup procedure to take a long time. We have recently started to bump the shares manually for manual flushes. While that guarantees that we will not drive the shares to zero, I will make the argument that we can do better by making sure that those things are, at this point, running alone: user experience is affected by startup times and the bump we give to user-triggered operations will only do so much. Even if we increase the shares a lot flushes will still be fighting for resources with compactions and startup will take longer than it could. By making sure that flushes are this point running alone we improve the user experience by making sure the startup is as fast as it can be. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-12-04 13:48:42 -05:00
Avi Kivity	414b14a6bd	Merge "Make inactive shard readers evictable" from Botond " This series attempts to solve the regressions recently discovered in performance of multi-partition range-scans. Namely that they: * Flood the reader concurrency semaphore's queues, trampling other reads. * Behave very badly when too many of them is running concurrently (trashing). * May deadlock if enough of them is running without a timeout. The solution for these problems is to make inactive shard readers evictable. This should address all three issues listed above, to varying degrees: * Shard readers will now not cling onto their permits for the entire duration of the scan, which might be a lot of time. * Will be less affected by infinite concurrency (more than the node can handle) as each scan now can make progress by evicting inactive shard readers belonging to other scans. * Will not deadlock at all. In addition to the above fix, this series also bundles two further improvements: * Add a mechanism to `reader_concurrecy_semaphore` to be notified of newly inserted evictables. * General cleanups and fixes for `multishard_combining_reader` and `foreign_reader`. I can unbundle these mini series and send them separately, if the maintainers so prefer, altough considering that this series will have to be backported to 3.0, I think this present form is better. Fixes: #3835 " * 'evictable-inactive-shard-readers/v7' of https://github.com/denesb/scylla: (27 commits) tests/multishard_mutation_query_test: test stateless query too tests/querier_cache: fail resource-based eviction test gracefully tests/querier_cache: simplify resource-based eviction test tests/mutation_reader_test: add test_multishard_combining_reader_next_partition tests/mutation_reader_test: restore indentation tests/mutation_reader_test: enrich pause-related multishard reader test multishard_combining_reader: use pause-resume API query::partition_slice: add clear_ranges() method position_in_partition: add region() accessor foreign_reader: add pause-resume API tests/mutation_reader_test: implement the pause-resume API query_mutations_on_all_shards(): implement pause-resume API make_multishard_streaming_reader(): implement the pause-resume API database: add accessors for user and streaming concurrency semaphores reader_lifecycle_policy: extend with a pause-resume API query_mutations_on_all_shards(): restore indentation query_mutations_on_all_shards(): simplify the state-machine multishard_combining_reader: use the reader lifecycle policy multishard_combining_reader: add reader lifecycle policy multishard_combining_reader: drop unnecessary `reader_promise` member ...	2018-12-04 10:22:35 +02:00
Botond Dénes	72ed655ef0	make_multishard_streaming_reader(): implement the pause-resume API	2018-12-04 08:51:05 +02:00
Botond Dénes	5f67a065c6	reader_lifecycle_policy: extend with a pause-resume API This API provides a way for the mulishard reader to pause inactive shard readers and later resume them when they are needed again. This allows for these paused shard readers to be evicted when the node is under pressure. How the readers are made evictable while paused is up to the clients. Using this API in the `multishard_combining_reader` and implementing it in the clients will be done in the next patches. Provide default implementation for the new virtual methods to facilitate gradual adoption.	2018-12-04 08:51:05 +02:00
Botond Dénes	007619de4c	multishard_combining_reader: use the reader lifecycle policy Refactor the multishard combining reader and its clients to use the reader lifecycle policy introduced in the previous patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	5a4fd1abab	multishard_combining_reader: drop support for streamed_mutation fast-forwarding It doesn't make sense for the multishard reader anyway, as it's only used by the row-cache. We are about to introduce the pausing of inactive shard readers, and it would require complex data structures and code to maintain support for this feature that is not even used. So drop it.	2018-12-04 08:51:05 +02:00
Botond Dénes	37f0117747	reader_concurrency_semaphore: refactor eviction mechanism As we are about to add multiple sources of evictable readers, we need a more scalable solution than a single functor being passed that opaquely evicts a reader when called. Add a generic way to register and unregister evictable (inactive) readers to the semaphore. The readers are expected to be registered when they become evictable and are expected to be unregistered when they cease to become evictable. The semaphore might evict any reader that is registered to it, when it sees fit. This also solves the problem of notifying the semaphore when new readers become evictable. Previously there was no such mechanism, and the semaphore would only evict any such new readers when a new permit was requested from it.	2018-12-04 08:51:00 +02:00
Benny Halevy	9e7125a9de	distributed_loader::populate_column_family: lookup and remove temp sstable directories These may be left over in case we crash while writing sstables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	857ff4f59a	database: directly use std::experimental::filesystem::path for lister::path Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	585ac6e641	database: use std::experimental::filesystem::path for lister::path We would like to get rid of boost::filesystem and gradually replace it with std::experimental::filesystem. TODO: using namespace fs = std::experimental::filesystem, use fs::path directly, rather than lister::path Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	90118fa9ef	sstable: create sstable component files in a subdirectory When writing the sstable, create a temporary directory for creating all components so that each sstable files' will be assigned a different allocaton groups on xfs. Files are immediately renamed to their default location after creation. Temp directory is removed when the sstable is sealed. Additional work to be introduced in the following patches: - When populating tables, temp directories need to be looked up and removed. Fixes #3167 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Vladimir Krivopalov	68458148e7	database: Capture io_priority_class by reference to avoid dangling ref. The original reference points to a thread-local storage object that guaranteed to outlive the continuation, but copying it make the subsequent calls point to a local object and introduces a use-after-free bug. Fixes #3948 Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-30 10:43:36 -08:00
Piotr Sarna	1336b9ee31	database: add is_internal_keyspace Similarly to is_system_keyspace, it will allow checking if a keyspace is created for internal use.	2018-11-28 09:21:56 +01:00
Duarte Nunes	098dd90bd2	Merge 'Reduce dependencies around consistency_level.hh' from Avi " consistency_level.hh is rather heavyweighy in both its contents and what it includes. Reduce the number of inclusion sites and split the file to reduce dependencies. " * tag 'cl-header/v2' of https://github.com/avikivity/scylla: consistency_level: simplify validation API Split consistency_level.hh header database: remove unneeded consistency_level.hh include cql: remove unneeded includes of consistency_level.hh	2018-11-27 11:59:34 +00:00
Avi Kivity	b015f41344	database: remove unneeded consistency_level.hh include	2018-11-27 13:30:56 +02:00
Raphael S. Carvalho	626afa6973	database: conditionally release sstable references from compaction manager Not all compaction operations submitted through compaction manager sets a callback for releasing references of exhausted sstables in compaction manager itself. That callback lives in compaction descriptor which is passed to table::compaction(). Let's make the call conditional to avoid bad function call exceptions. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181126235616.10452-1-raphaelsc@scylladb.com>	2018-11-27 12:10:43 +01:00
Duarte Nunes	2a371c2689	Merge 'Allow bypassing cache on a per-query basis' from Avi " Some queries are very unlikely to hit cache. Usually this includes range queries on large tables, but other patterns are possible. While the database should adapt to the query pattern, sometimes the user has information the database does not have. By passing this information along, the user helps the database manage its resources more optimally. To do this, this patch introduces a BYPASS CACHE clause to the SELECT statement. A query thus marked will not attempt to read from the cache, and instead will read from sstables and memtables only. This reduces CPU time spent to query and populate the cache, and will prevent the cache from being flooded with data that is not likely to be read again soon. The existing cache disabled path is engaged when the option is selected. Tests: unit (release), manual metrics verification with ccm with and without the BYPASS CACHE clause. Ref #3770. " * tag 'cache-bypass/v2' of https://github.com/avikivity/scylla: doc: document SELECT ... BYPASS CACHE tests: add test for SELECT ... BYPASS CACHE cql: add SELECT ... BYPASS CACHE clause db: add query option to bypass cache	2018-11-26 09:59:40 +00:00
Avi Kivity	b835b93ee6	db: add query option to bypass cache With the option enabled, we bypass the cache unconditionally and only read from memtables+sstables. This is useful for analytics queries.	2018-11-25 16:26:08 +02:00
Raphael S. Carvalho	2058001f94	sstables/compaction: propagate sstable replacement to all compaction of a CF This is needed for parallel compaction to work with sstable run based approach. That's because regular compaction clones a set containing all sstables of its column family. So compaction A can potentially hold a reference to a compacting sstable of compaction B, so preventing compacting B from releasing its exhausted sstable. So all replacements are propagated to all compactions of a given column family, and compactions in turn, including the one which initiated the propagation, will do the replacement. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:30 -02:00
Raphael S. Carvalho	953fdcc867	sstables: store cf pointer in compaction_info motivation is that we need a more efficient way to find compactions that belong to a given column family in compaction list. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:28 -02:00

1 2 3 4 5 ...

1174 Commits