scylladb

Author	SHA1	Message	Date
Avi Kivity	475b151c97	Merge "Use utils::small_vector more in read path" from Paweł " This series optimises the read path by replacing some usages of std::vector by utils::small_vector. The motivation for this change was an observation that memory allocation functions are pointed out by the profiler as the ones where we spent most time and while they have a large number of callers storage allocation for some vectors was close to the top. The gains are not huge, since the problem is a lot of things adding up and not a single slow thing, but we need to start with something. Unfortunately, the performance of boost::container::small_vector is quite disappointing so a new implementation of a small_vector was introduced. perf_simple_query -c4 --duration 60, medians: ./perf_before ./perf_after diff read 343086.80 360720.53 5.1% Tests: unit(release, small_vector in debug) " * tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla: partition_slice: use small_vector for column_ids mutation_fragment_merger: use small_vector auth: use small_vector in resource auth: avoid list-initialisation of vectors idl: serialiser: add serialiser for utils::small_vector idl: serialiser: deduplicate vector serialisers utils: introduce small_vector intrusive_set_external_comparator: make iterator nothrow move constructible mutation_fragment_merger: value-initialise iterator	2018-12-10 13:50:59 +02:00
Paweł Dziepak	9024187222	partition_slice: use small_vector for column_ids	2018-12-06 14:21:04 +00:00
Glauber Costa	fee4d2eb9b	compaction_manager: delay initialization of the compaction manager. If the compaction manager is started, compactions may start (this is regardless of whether or not we trigger them). The problem with that is that they start at a time in which we are flushing the commitlog and the initialization procedure waits for the commitlog to be fully flushed and the resulting memtables flushed before we move on. Because there are no incoming writes, the amount of shares in memtable flushes decrease as memory used decreases and that can cause the startup procedure to take a long time. We have recently started to bump the shares manually for manual flushes. While that guarantees that we will not drive the shares to zero, I will make the argument that we can do better by making sure that those things are, at this point, running alone: user experience is affected by startup times and the bump we give to user-triggered operations will only do so much. Even if we increase the shares a lot flushes will still be fighting for resources with compactions and startup will take longer than it could. By making sure that flushes are this point running alone we improve the user experience by making sure the startup is as fast as it can be. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-12-04 13:48:42 -05:00
Avi Kivity	414b14a6bd	Merge "Make inactive shard readers evictable" from Botond " This series attempts to solve the regressions recently discovered in performance of multi-partition range-scans. Namely that they: * Flood the reader concurrency semaphore's queues, trampling other reads. * Behave very badly when too many of them is running concurrently (trashing). * May deadlock if enough of them is running without a timeout. The solution for these problems is to make inactive shard readers evictable. This should address all three issues listed above, to varying degrees: * Shard readers will now not cling onto their permits for the entire duration of the scan, which might be a lot of time. * Will be less affected by infinite concurrency (more than the node can handle) as each scan now can make progress by evicting inactive shard readers belonging to other scans. * Will not deadlock at all. In addition to the above fix, this series also bundles two further improvements: * Add a mechanism to `reader_concurrecy_semaphore` to be notified of newly inserted evictables. * General cleanups and fixes for `multishard_combining_reader` and `foreign_reader`. I can unbundle these mini series and send them separately, if the maintainers so prefer, altough considering that this series will have to be backported to 3.0, I think this present form is better. Fixes: #3835 " * 'evictable-inactive-shard-readers/v7' of https://github.com/denesb/scylla: (27 commits) tests/multishard_mutation_query_test: test stateless query too tests/querier_cache: fail resource-based eviction test gracefully tests/querier_cache: simplify resource-based eviction test tests/mutation_reader_test: add test_multishard_combining_reader_next_partition tests/mutation_reader_test: restore indentation tests/mutation_reader_test: enrich pause-related multishard reader test multishard_combining_reader: use pause-resume API query::partition_slice: add clear_ranges() method position_in_partition: add region() accessor foreign_reader: add pause-resume API tests/mutation_reader_test: implement the pause-resume API query_mutations_on_all_shards(): implement pause-resume API make_multishard_streaming_reader(): implement the pause-resume API database: add accessors for user and streaming concurrency semaphores reader_lifecycle_policy: extend with a pause-resume API query_mutations_on_all_shards(): restore indentation query_mutations_on_all_shards(): simplify the state-machine multishard_combining_reader: use the reader lifecycle policy multishard_combining_reader: add reader lifecycle policy multishard_combining_reader: drop unnecessary `reader_promise` member ...	2018-12-04 10:22:35 +02:00
Botond Dénes	72ed655ef0	make_multishard_streaming_reader(): implement the pause-resume API	2018-12-04 08:51:05 +02:00
Botond Dénes	5f67a065c6	reader_lifecycle_policy: extend with a pause-resume API This API provides a way for the mulishard reader to pause inactive shard readers and later resume them when they are needed again. This allows for these paused shard readers to be evicted when the node is under pressure. How the readers are made evictable while paused is up to the clients. Using this API in the `multishard_combining_reader` and implementing it in the clients will be done in the next patches. Provide default implementation for the new virtual methods to facilitate gradual adoption.	2018-12-04 08:51:05 +02:00
Botond Dénes	007619de4c	multishard_combining_reader: use the reader lifecycle policy Refactor the multishard combining reader and its clients to use the reader lifecycle policy introduced in the previous patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	5a4fd1abab	multishard_combining_reader: drop support for streamed_mutation fast-forwarding It doesn't make sense for the multishard reader anyway, as it's only used by the row-cache. We are about to introduce the pausing of inactive shard readers, and it would require complex data structures and code to maintain support for this feature that is not even used. So drop it.	2018-12-04 08:51:05 +02:00
Botond Dénes	37f0117747	reader_concurrency_semaphore: refactor eviction mechanism As we are about to add multiple sources of evictable readers, we need a more scalable solution than a single functor being passed that opaquely evicts a reader when called. Add a generic way to register and unregister evictable (inactive) readers to the semaphore. The readers are expected to be registered when they become evictable and are expected to be unregistered when they cease to become evictable. The semaphore might evict any reader that is registered to it, when it sees fit. This also solves the problem of notifying the semaphore when new readers become evictable. Previously there was no such mechanism, and the semaphore would only evict any such new readers when a new permit was requested from it.	2018-12-04 08:51:00 +02:00
Benny Halevy	9e7125a9de	distributed_loader::populate_column_family: lookup and remove temp sstable directories These may be left over in case we crash while writing sstables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	857ff4f59a	database: directly use std::experimental::filesystem::path for lister::path Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	585ac6e641	database: use std::experimental::filesystem::path for lister::path We would like to get rid of boost::filesystem and gradually replace it with std::experimental::filesystem. TODO: using namespace fs = std::experimental::filesystem, use fs::path directly, rather than lister::path Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	90118fa9ef	sstable: create sstable component files in a subdirectory When writing the sstable, create a temporary directory for creating all components so that each sstable files' will be assigned a different allocaton groups on xfs. Files are immediately renamed to their default location after creation. Temp directory is removed when the sstable is sealed. Additional work to be introduced in the following patches: - When populating tables, temp directories need to be looked up and removed. Fixes #3167 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Vladimir Krivopalov	68458148e7	database: Capture io_priority_class by reference to avoid dangling ref. The original reference points to a thread-local storage object that guaranteed to outlive the continuation, but copying it make the subsequent calls point to a local object and introduces a use-after-free bug. Fixes #3948 Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-11-30 10:43:36 -08:00
Piotr Sarna	1336b9ee31	database: add is_internal_keyspace Similarly to is_system_keyspace, it will allow checking if a keyspace is created for internal use.	2018-11-28 09:21:56 +01:00
Duarte Nunes	098dd90bd2	Merge 'Reduce dependencies around consistency_level.hh' from Avi " consistency_level.hh is rather heavyweighy in both its contents and what it includes. Reduce the number of inclusion sites and split the file to reduce dependencies. " * tag 'cl-header/v2' of https://github.com/avikivity/scylla: consistency_level: simplify validation API Split consistency_level.hh header database: remove unneeded consistency_level.hh include cql: remove unneeded includes of consistency_level.hh	2018-11-27 11:59:34 +00:00
Avi Kivity	b015f41344	database: remove unneeded consistency_level.hh include	2018-11-27 13:30:56 +02:00
Raphael S. Carvalho	626afa6973	database: conditionally release sstable references from compaction manager Not all compaction operations submitted through compaction manager sets a callback for releasing references of exhausted sstables in compaction manager itself. That callback lives in compaction descriptor which is passed to table::compaction(). Let's make the call conditional to avoid bad function call exceptions. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20181126235616.10452-1-raphaelsc@scylladb.com>	2018-11-27 12:10:43 +01:00
Duarte Nunes	2a371c2689	Merge 'Allow bypassing cache on a per-query basis' from Avi " Some queries are very unlikely to hit cache. Usually this includes range queries on large tables, but other patterns are possible. While the database should adapt to the query pattern, sometimes the user has information the database does not have. By passing this information along, the user helps the database manage its resources more optimally. To do this, this patch introduces a BYPASS CACHE clause to the SELECT statement. A query thus marked will not attempt to read from the cache, and instead will read from sstables and memtables only. This reduces CPU time spent to query and populate the cache, and will prevent the cache from being flooded with data that is not likely to be read again soon. The existing cache disabled path is engaged when the option is selected. Tests: unit (release), manual metrics verification with ccm with and without the BYPASS CACHE clause. Ref #3770. " * tag 'cache-bypass/v2' of https://github.com/avikivity/scylla: doc: document SELECT ... BYPASS CACHE tests: add test for SELECT ... BYPASS CACHE cql: add SELECT ... BYPASS CACHE clause db: add query option to bypass cache	2018-11-26 09:59:40 +00:00
Avi Kivity	b835b93ee6	db: add query option to bypass cache With the option enabled, we bypass the cache unconditionally and only read from memtables+sstables. This is useful for analytics queries.	2018-11-25 16:26:08 +02:00
Raphael S. Carvalho	2058001f94	sstables/compaction: propagate sstable replacement to all compaction of a CF This is needed for parallel compaction to work with sstable run based approach. That's because regular compaction clones a set containing all sstables of its column family. So compaction A can potentially hold a reference to a compacting sstable of compaction B, so preventing compacting B from releasing its exhausted sstable. So all replacements are propagated to all compactions of a given column family, and compactions in turn, including the one which initiated the propagation, will do the replacement. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:30 -02:00
Raphael S. Carvalho	953fdcc867	sstables: store cf pointer in compaction_info motivation is that we need a more efficient way to find compactions that belong to a given column family in compaction list. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:28 -02:00
Raphael S. Carvalho	fc92fb955d	sstables/compaction_manager: release reference to exhausted sstable through callback That's important for the reference to sstable to not be kept throughout the compaction procedure, which would break the goal of releasing space during compaction. Manager passes a callback to compaction which calls it whenever there's sstable replacement. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:16 -02:00
Raphael S. Carvalho	3433de3dc0	database: do not keep reference to sstable in selector when done selecting When compacting, we'll create all readers at once and will not select again from incremental selector, meaning the selector will keep all respective sstables in current_sstables, preventing compaction from releasing space as it goes on. The change is about refreshing sstable set's selector such that it will not hold a reference to an exhausted sstable whatsoever. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:12 -02:00
Raphael S. Carvalho	e5a0b05c15	sstables/compaction: release space earlier of exhausted input sstables Currently, compaction only replace input sstables at end of compaction, meaning compaction must be finished for all the space of those sstables to be released. What we can do instead is to delete earlier some input sstable under some conditions: 1) SStable data should be committed to a new, sealed output sstable, meaning it's exhausted. 2) Exhausted sstable mustn't overlap with a non-exhausted sstable because a tombstone in the exhausted could have been purged and the shadowed data in non-exhausted could be ressurected if system crashes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:07 -02:00
Raphael S. Carvalho	8d11b0bbb4	database: do not store reference to sstable in incremental selector Use sstable generation instead to keep track of read sstables. The motivation is that we'll not keep reference to sstables, so allowing their space on disk to be released as soon they get exhausted. Generation is used because it guarantees uniqueness of the sstable. Reviewed-by: Botond Dénes <bdenes@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-11-24 18:53:04 -02:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Glauber Costa	9f403334c8	remove monitor if sstable write failed In (almost) all SSTable write paths, we need to inform the monitor that the write has failed as well. The monitor will remove the SSTable from controller's tracking at that point. Except there is one place where we are not doing that: streaming of big mutations. Streaming of big mutations is an interesting use case, in which it is done in 2 parts: if the writing of the SSTable fails right away, then we do the correct thing. But the SSTables are not commited at that point and the monitors are still kept around with the SSTables until a later time, when they are finally committed. Between those two points in time, it is possible that the streaming code will detect a failure and manually call fail_streaming_mutations(), which marks the SSTable for deletions. At that point we should propagate that information to the monitor as well, but we don't. Fixes #3732 (hopefully) Tests: unit (release) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20181114213618.16789-1-glauber@scylladb.com>	2018-11-20 16:15:12 +00:00
Piotr Sarna	de43b4f41d	database: add a check if loaded sstable is already staging Staging sstables are loaded before regular ones. If the process fails midway, an sstable can be linked both in the regular directory and in staging directory. In such cases, the sstable remains in staging and will be moved to the regular directory by view update streamer service.	2018-11-13 15:04:43 +01:00
Piotr Sarna	c825a17b9d	table: move push_view_replica_updates to table.cc	2018-11-13 14:52:22 +01:00
Piotr Sarna	a17fcb8d94	database: add populating tables with staging sstables After populating tables with regular sstables, same procedure is performed for staging sstables.	2018-11-13 14:52:22 +01:00
Piotr Sarna	19bf94fa8f	database: add creating /staging directory for sstables staging directory is now created on boot.	2018-11-13 14:52:22 +01:00
Piotr Sarna	e42d97060f	database: provide nonfrozen version of push_view_replica_updates Now it's also possible to pass a mutation to push to view replicas.	2018-11-13 11:45:30 +01:00
Piotr Sarna	642c3ae0e0	database: add subdir param to make_streaming_sstable_for_write This function allows specifying a subfolder to put a newly created sstable in - e.g. staging/ subfolder for streamed base table mutations.	2018-11-13 11:45:30 +01:00
Piotr Sarna	8e053f9efb	database: add staging sstables to a map SSTables that belong to staging/ directory are put in the _sstables_staging map.	2018-11-13 11:45:30 +01:00
Piotr Sarna	3f34312aa6	database: skip staging sstables in compaction Staging sstables are not part of the compaction process to ensure than each sstable can be easily excluded from view generation process that depends on the mentioned sstable.	2018-11-13 11:45:30 +01:00
Avi Kivity	a71ab365e3	toplevel: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Botond Dénes	23f3831aaf	table::make_streaming_reader(): add forwarding parameter The single-range overload, when used by make_multishard_streaming_reader(), has to create a reader that is forwardable. Otherwise the multishard streaming reader will not produce any output as it cannot fast-forward its shard readers to the ranges produced by the generator. Also add a unit test, that is based on the real-life purpose the multishard streaming reader was designed for - serving partition from a shard, according to a sharding configuration that is different than the local one. This is also the scenario that found the buf in the first place. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <bf799961bfd535882ede6a54cd6c4b6f92e4e1c1.1539235034.git.bdenes@scylladb.com>	2018-10-11 10:59:18 +03:00
Botond Dénes	4bb0bbb9e2	database: add make_multishard_streaming_reader() Creates a streaming reader that reads from all shards. Shard readers are created with `table::make_streaming_reader()`. This is needed for the new row-level repair. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4b74c710bed2ef98adf07555a4c841e5b690dd8c.1538470782.git.bdenes@scylladb.com>	2018-10-09 11:07:47 +03:00
Botond Dénes	3eeb6fbd23	table::make_streaming_reader(): add single-range overload This will be used by the `make_multishard_streaming_reader()` in the next patch. This method will create a multishard combining reader which needs its shard readers to take a single range, not a vector of ranges like the existing overload. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <cc6f2c9a8cf2c42696ff756ed6cb7949b95fe986.1538470782.git.bdenes@scylladb.com>	2018-10-09 11:07:46 +03:00
Nadav Har'El	bebe5b5df2	materialized views: add view_updates_pending statistic We are already maintaining a statistic of the number of pending view updates sent but but not yet completed by view replicas, so let's expose it. As all per-table statistics, also this one will only be exposed if the "--enable-keyspace-column-family-metrics" option is on. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-10-02 20:44:58 +01:00
Glauber Costa	c3f27784de	database: guarantee a minimum amount of shares when manual operations are requested. We have found issues when a flush is requested outside the usual memtable flush loop and because there is not a lot of data the controller will not have a high amount of shares. To prevent this, this patch guarantees some minimum amount of shares when extraneous operations (nodetool flush, commitlog-driven flush, etc) are requested. Another option would be to add shares instead of guarantee a minimum. But in my view the approach I am taking here has two main advantages: 1) It won't cause spikes when those operations are requested 2) It is cumbersome to add shares in the current infrastructure, as just adding backlog can cause shares to spike. Consider this example: Backlog is within the first range of very low backlog (~0.2). Shares for this would be around ~20. If we want to add 200 shares, that is equivalent to a backlog of 0.8. Once we add those two backlogs together, we end up with 1 (max backlog). Fixes #3761 Tests: unit (release) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180927131904.8826-1-glauber@scylladb.com>	2018-09-27 15:20:31 +02:00
Avi Kivity	337ee6153a	Merge "Support SSTables 3.x in Scylla runtime" from Vladimir and Piotr " This patchset makes it possible to use SSTables 'mc' format, commonly referred to as 'SSTables 3.x', when running Scylla instance. Several bugs found on this way are fixed. Also, a configuration option is introduced to allow running Scylla either with 'mc' or 'la' format as default. Tests: unit {release} + tested Scylla with both 'la' and 'mc' formats to work fine: cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; [3/1890] cqlsh> USE test; cqlsh:test> CREATE TABLE cfsst3 (pk int, ck int, rc int, PRIMARY KEY (pk, ck)) WITH compression = {'sstable_compression': ''}; cqlsh:test> INSERT INTO cfsst3 (pk, ck, rc) VALUES ( 4, 7, 8); <<flush>> cqlsh:test> DELETE from cfsst3 WHERE pk = 4 and ck> 3 and ck < 8; <<flush>> cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 2, 3); cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 4, 6); cqlsh:test> SELECT * FROM cfsst3 ; pk \| ck \| rc ----+----+------ 2 \| 3 \| null 4 \| 6 \| null (2 rows) <<Scylla restart>> cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 5, 7); cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 6, 8); cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 7, 9); cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 8, 10); cqlsh:test> SELECT * from cfsst3 ; pk \| ck \| rc ----+----+------ 5 \| 7 \| null 8 \| 10 \| null 2 \| 3 \| null 4 \| 6 \| null 7 \| 9 \| null 6 \| 8 \| null (6 rows) " * 'projects/sstables-30/try-runtime/v8' of https://github.com/argenet/scylla: database: Honour enable_sstables_mc_format configuration option. sstables: Support SSTables 'mc' format as a feature. db: Add configuration option for enabling SSTables 'mc' format. tests: Add test for reading a complex column with zero subcolumns (SST3). sstables: Fix parsing of complex columns with zero subcolumns. sstables: Explicitly cast api::timestamp_type to uint64_t when delta-encoding. sstables: Use parser_type instead of abstract_type::parse_type in column_translation. bytes: Add helper for turning bytes_view into sstring_view. sstables: Only forward the call to fast_forwarding_to in mp_row_consumer_m if filter exists. sstables: Fix string formatting for exception messages in m_format_read_helpers. sstables: Don't validate timestamps against the max value on parsing. sstables: Always store only min bases in serialization_header. sstables: Support 'mc' version parsing from filename. SST3: Make sure we call consume_partition_end	2018-09-26 11:10:07 +01:00
Vladimir Krivopalov	cd80d6ff65	database: Honour enable_sstables_mc_format configuration option. Only enable SSTables 'mc' format if the entire cluster supports it and it is enabled in the configuration file. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:23:40 -07:00
Raphael S. Carvalho	745e35fa82	database: Fix sstable resharding for mc format SStable format mc doesn't write ancestors to metadata, so resharding will not work with this new format because it relies on ancestors to replace new unshared sstables with old shared ones. Fix is about not relying on ancestors metadata for this operation. Fixes #3777. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180922211933.1987-1-raphaelsc@scylladb.com>	2018-09-25 18:37:48 +03:00
Botond Dénes	eb357a385d	flat_mutation_reader: make timeout opt-out rather than opt-in Currently timeout is opt-in, that is, all methods that even have it default it to `db::no_timeout`. This means that ensuring timeout is used where it should be is completely up to the author and the reviewrs of the code. As humans are notoriously prone to mistakes this has resulted in a very inconsistent usage of timeout, many clients of `flat_mutation_reader` passing the timeout only to some members and only on certain call sites. This is small wonder considering that some core operations like `operator()()` only recently received a timeout parameter and others like `peek()` didn't even have one until this patch. Both of these methods call `fill_buffer()` which potentially talks to the lower layers and is supposed to propagate the timeout. All this makes the `flat_mutation_reader`'s timeout effectively useless. To make order in this chaos make the timeout parameter a mandatory one on all `flat_mutation_reader` methods that need it. This ensures that humans now get a reminder from the compiler when they forget to pass the timeout. Clients can still opt-out from passing a timeout by passing `db::no_timeout` (the previous default value) but this will be now explicit and developers should think before typing it. There were suprisingly few core call sites to fix up. Where a timeout was available nearby I propagated it to be able to pass it to the reader, where I couldn't I passed `db::no_timeout`. Authors of the latter kind of code (view, streaming and repair are some of the notable examples) should maybe consider propagating down a timeout if needed. In the test code (the wast majority of the changes) I just used `db::no_timeout` everywhere. Tests: unit(release, debug) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1edc10802d5eb23de8af28c9f48b8d3be0f1a468.1536744563.git.bdenes@scylladb.com>	2018-09-20 11:31:24 +02:00
Raphael S. Carvalho	5bc028f78b	database: fix 2x increase in disk usage during cleanup compaction Don't hold reference to sstables cleaned up, so that file descriptors for their index and data files will be closed and consequently disk space released. Fixes #3735. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180914194047.26288-1-raphaelsc@scylladb.com>	2018-09-17 17:26:46 +03:00
Botond Dénes	253407bdc8	multishard_mutation_query: add badness counters Add badness counters that allow tracking problems. The following counters are added: 1) multishard_query_unpopped_fragments 2) multishard_query_unpopped_bytes 3) multishard_query_failed_reader_stops 4) multishard_query_failed_reader_saves The first pair of counters observe the amount of work range scan queries have to undo on each page. It is normal for these counters to be non-zero, however sudden spikes in their values can indicate problems. This undoing of work is needed for stateful range-scans to work. When stateful queries are enabled the `multishard_combining_reader` is dismantled and all unconsumed fragments in its and any of its intermediate reader's buffers are pushed back into the originating shard reader's buffer (via `unpop_mutation_fragment()`). This also includes the `partition_start`, the `static_row` (if there is one) and all extracted and active `range_tombstone` fragments. This together can amount to a substantial amount of fragments. (1) counts the amount of fragments moved back, while (2) counts the number of bytes. Monitoring size and quantity separately allows for detecting edge cases like moving many small fragments or just a few huge ones. The counters count the fragments/bytes moved back to readers located on the shard they belong to. The second pair of counters are added to detect any problems around saving readers. Since the failure to save a reader will not fail the read itself, it is necessary to add visibility to these failures by other means. (3) counts the number of times stopping a shard reader (waiting on pending read-aheads and next-partitions) failed while (4) counts the number of times inserting the reader into the `querier_cache` failed. Contrary to the first two counters, which will almost certainly never be zero, these latter two counters should always be zero. Any other value indicates problems in the respective shards/nodes.	2018-09-03 10:31:44 +03:00
Botond Dénes	5f726e9a89	querier: move all to query namespace To avoid name clashes.	2018-09-03 10:31:44 +03:00
Glauber Costa	8dea1b3c61	database: fix directory for information when loading new SSTables from upload dir When we load new SSTables, we use the directory information from the entry descriptor to build information about those SSTables. When the descriptor is created by flush_upload_dir, the sstable directory used in the descriptor contains the `upload` part. Therefore, we will try to load SSTables that are in the upload directory when we already moved them out and fail. Since the generation also changes, we have been historically fixing the generation manually, but not the SSTable directory. The reason for that is that up until recently, the SSTable directory was passed statically to open_sstables, ignoring whatever the entry descriptor said. Now that the sstable directory is also derived from the entry descriptor, we should fix that too. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180829165326.12183-1-glauber@scylladb.com>	2018-08-30 10:34:25 +03:00

1 2 3 4 5 ...

1146 Commits