scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 20:16:43 +00:00

Author	SHA1	Message	Date
Avi Kivity	7438c9de5c	Merge "Fix database freeze with load for multiple CFs" from Glauber "Issue 1195 describes a scenario with a fairly easy reproducer in which we can freeze the database. That involves writing simultaneously to multiple CFs, such that the sum of all the memory they are using is larger than the dirty memory limit, without not any of them individually being larger than the memtable size. This patchset rewrites the throttling code, including now active flushes so that this situation cannot happen. Fixes #1195"	2016-07-06 09:48:13 +03:00
Glauber Costa	b0932ceb04	database: act on LSA pressure notification Issue 1195 describes a scenario with a fairly easy reproducer in which we can freeze the database. That involves writing simultaneously to multiple CFs, such that the sum of all the memory they are using is larger than the dirty memory limit, without not any of them individually being larger than the memtable size. Because we will never reach the individual memtable seal size for any of them, none of them will initiate a flush leading the database to a halt. The LSA has now gained infrastructure that allow us to be notified when pressure conditions mount. What we will do in this case is initiate a flush ourselves. Fixes #1195 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-07-05 17:46:28 -04:00
Glauber Costa	7169b727ea	move system tables to its own region In the spirit of what we are doing for the read semaphore, this patch moves system writes to its own dirty memory manager. Not only will it make sure that system tables will not be serialized by its own semaphore, but it will also put system tables in its own region group. Moving system tables to its own region group has the advantage that system requests won't be waiting during throttle behind a potentially big queue of user requests, since requests are tended to in FIFO order within the same region group. However, system tables being more controlled and predictable, we can actually go a step further and give them some extra reservation so they may not necessarily block even if under pressure (up to 10 MB more). Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-07-05 17:46:28 -04:00
Glauber Costa	c358947284	database: wrap semaphore and region group into a new dirty memory manager We currently have a semaphore in the column family level that protects us against multiple concurrent sstable flushes. However, storing that semaphore into the CF, not the database, was a (implementation, not design) mistake. One comment in particular makes it quite clear: // Ideally, we'd allow one memtable flush per shard (or per database object), and write-behind // would take care of the rest. But that still has issues, so we'll limit parallelism to some // number (4), that we will hopefully reduce to 1 when write behind works. So I aimed for the shard, but ended up coding it into the CF because that's closer to the flush point - my bad. This patch fixes this while paving the way for active reclaim to take place. It wraps the semaphore and the region group in a new structure, the dirty_memory_manager. The immediate benefit is that we don't need to be passing both the semaphore and the region group downwards in the DB -> CF path. The long term benefit is that we now have a one unified structure that can hold shared flush data in all of the CFs. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-07-05 15:29:04 -04:00
Glauber Costa	d41fcd45d1	memtables: make memtable inherit from region The LSA memory pressure mechanism will let us know which region is the best candidate for eviction when under pressure. We need to somehow then translate region -> memtable -> column family. The easiest way to convert from region to memtable, is having memtable inherit from region. Despite the fact that this requires multiple inheritance, which always raise a flag a bit, the other class we inherit from is enable_shared_from_this, which has a very simple and well defined interface. So I think it is worthy for us to do it. Once we have the memtable, grabing the column family is easy provided we have a database object. We can grab it from the schema. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-07-05 15:05:29 -04:00
Glauber Costa	0c31f3e626	database: move memtable throttler to the LSA throttler The LSA infrastructure, through the use of its region groups, now have a throttler mechanism built-in. This patch converts the current throttlers so that the LSA throttler is used instead. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-07-05 15:05:19 -04:00
Yoav Kleinberger	0ad940bfc3	tools/scyllatop: fix crash due to mouse events due to an urwid-library technicality, some mouse events like scroll or click would crash scyllatop. This patch fixes this problem. closes issue #1396. Signed-off-by: Yoav Kleinberger <yoav@scylladb.com> Message-Id: <1467294117-19218-1-git-send-email-yoav@scylladb.com>	2016-07-05 19:08:55 +03:00
Avi Kivity	cb59e724ee	Merge "Fix enabling sstable read ahead" from Paweł "This series contains remaining changes necessary to safely enable read ahead of sstables. Basically, it makes sure that input_streams are always properly closed (even in case of exception during read)."	2016-07-05 19:04:19 +03:00
Raphael S. Carvalho	e688fc9550	api: provide estimation of pending compaction Use compaction_strategy::estimated_pending_compaction() to provide user with an estimation of number of compaction for strategy to be fully satisfied. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <39b7d91f2525ca38fb2ce9d8885d0c2e727de7ed.1467667054.git.raphaelsc@scylladb.com>	2016-07-05 19:03:12 +03:00
Raphael S. Carvalho	43926026c3	compaction: introduce compaction strategy method to estimate pending compaction At the moment, it's not possible to know how many compaction are needed for compaction strategy to be satisfied. It's not possible to know exactly the number of pending compaction, but the strategy can provide an estimation. For size tiered, it's based on number of sstables in each bucket. By dividing bucket size by max threshold, we get number of compaction needed to compact that single bucket. For leveled, it's about the number of sstables that exceeds the limit in each level. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <e209e52f6159ee274a8358b69961a7c0ce357f7d.1467667054.git.raphaelsc@scylladb.com>	2016-07-05 19:03:11 +03:00
Avi Kivity	76cc6408cd	Merge "feature check for seed node" from Asias ""This series implemnts feature check for seed node.	2016-07-05 19:01:01 +03:00
Asias He	6f69963ef9	system_keyspace: Simplify load_host_ids implementation - Use plain loop instead of do_for_each - Use row.get_as() instead of row.template get_as() Message-Id: <3e108d3a6258c0caaf569eb9c79532d9789ea411.1467703722.git.asias@scylladb.com>	2016-07-05 09:47:21 +02:00
Asias He	3f31be58b6	system_keyspace: Simplify load_tokens implemntation - Use plain loop instead of do_for_each - Use row.get_as() instead of row.template get_as() Message-Id: <f959ace4f30078695d383c849ed4520169228f97.1467703722.git.asias@scylladb.com>	2016-07-05 09:47:21 +02:00
Asias He	5236e7a379	storage_service: Implement feature check for seed node Checking features for seed node is a bit more complicated than non-seed node, because non-seed node can always talk to at least one seed node, seed node may not. In this patch, we distingush new cluster and existing cluster by checking if the system table is empty. We relax the feature check for new cluster because the feature check is mostly useful when upgrading an existing cluster to prevent old node to join new cluster. When talking to a seed node failed during the check, we fallback to the check using features stored in the system table. This makes restarting a seed node when no other seed node is up possible (no other seed node at all, or other seed node is not up yet). I tested the following scenarios. 1) start a completely new seed node in a new cluster * system table is empty, skip the check. 2) start a cluster, restart one seed node, at least one other seed node is up * system table is not empty, check with shadow round, shadow round will * succeed 3) start a cluster, restart one seed node, no other seed node is up * system table is not empty, check with shadow round, shadow round will * fail, fallback to system table check. 4) start a cluster, shutdown all the nodes, start one seed node with new ip address, seed list in yaml is updated with new ip address * system table is not empty, check with shadow round, shadow round will * fail, fallback to system table check	2016-07-05 10:09:54 +08:00
Asias He	bb80362c3f	gossip: Insert with result.end() in get_supported_features It is faster than result.begin(), suggested by Avi.	2016-07-05 10:09:54 +08:00
Asias He	72cb4a228b	gossip: Add to_feature_set helper To convert a "," split feature string to a feature set.	2016-07-05 10:09:54 +08:00
Asias He	1d6c57fb40	gossip: Reduce timeout in shadow round In `3a36ec33db` (gossip: Wait longer for seed node during boot up), we increased the timeout by the factor of 60, i.e., ring_dealy * 60 = 5 seconds * 60 = 5 minutes. In `57ee9676c2` (storage_service: Fix default ring_delay time), we fixed the default ring_dealy to 30 seconds. Now the timeout is 30 * 60 seconds = 30 minutes, which is too long. Make it 5 minues.	2016-07-05 10:09:54 +08:00
Asias He	88f0bb3a7b	gossip: Add check_knows_remote_features To check if this node knows features in std::unordered_map<inet_address, sstring> peer_features_string	2016-07-05 10:09:54 +08:00
Asias He	2b53c50c15	gossip: Add get_supported_features To get features supported by all the nodes listed in the address/feature map.	2016-07-05 10:09:53 +08:00
Asias He	31df4e5316	system_keyspace: Introduce load_peer_features To get the peer features stored in the system.peers table.	2016-07-05 10:09:53 +08:00
Paweł Dziepak	4acf77d755	sstables: drop unused data_stream_at() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-04 18:17:43 +01:00
Paweł Dziepak	2cdf498bbd	sstables: close input stream in sstable::data_read() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-04 18:17:42 +01:00
Paweł Dziepak	8931b939a1	sstables: use finally() to close input streams Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-04 18:17:42 +01:00
Paweł Dziepak	e6ececce7f	Merge seastar upstream Submodule seastar a47f893..c82c36f: > reactor: fix build error > util: lazy_eval: fix compilation errors related to operator<<()s definitions	2016-07-04 18:14:05 +01:00
Duarte Nunes	41843b32c5	thrift: Correctly mark a CF as dense And store whether the comparator is a composite type in the case of dynamic CFs. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1467307688-11059-1-git-send-email-duarte@scylladb.com>	2016-07-04 17:40:53 +02:00
Nadav Har'El	c4e871ea2d	Work around unexpected data_value constructor If someone tried to naively use utf8_type->decompose("18wX"), this would mysteriously fail, returning an empty key. decompose takes a data_value, so the compiler looked for an implict conversion from the string constant (const char) to data_value. We did not have such a conversion, only conversion from sstring. But the compiler chose (backed by the C++ standard, no doubt) to implicitly convert the const char to a bool (!), and then use data_value(bool). It did not convert the const char* to an sstring, nor did it warn about the possible ambiguity. So this patch adds a data_value(const char*) constructor, so people will not fall into the same trap that I fell into... Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1467643462-6349-1-git-send-email-nyh@scylladb.com>	2016-07-04 17:50:53 +03:00
Avi Kivity	e22517bafc	Merge "Optimize reads from leveled sstables" In a leveled column family, there can be many thousands of sstables, since each sstable is limited to a relatively small size (160M by default). With the current approach of reading from all sstables in parallel, cpu quickly becomes a bottleneck as we need to check the bloom filter for each of these sstables. This patch addresses the problem by introducing a compaction-strategy-specific data structure for holding sstables. This data structure has a method to obtain the sstables used for a read. For leveled compaction strategy, this data structure is an interval map, which can be efficiently used to select the right sstables.	2016-07-04 16:00:35 +03:00
Asias He	610a0f7ef0	storage_service: Skip feature check for seed node for now When a seed node boots up with more than one node in the seed list, it will fail to talk to the other seed node which is not up yet. This fails the feature check, so the seed node will not boot. Skip the feature check for seed node for now, util we have a proper solution. Fixes recent dtest failure due to fail to boot the seed node. Message-Id: <e1d4110f96817e45f81dc0bc948dd14600fc5333.1467251799.git.asias@scylladb.com>	2016-07-04 15:09:57 +03:00
Avi Kivity	28fab55e6e	Merge "Convert sstable writes to streamed mutations" from Paweł "This series converts sstable writers (including compaction) to streamed mutations and makes them use consumer-style interface. Code related to sstable writes and compaction is converted to consumers that can be used with consume_flattened_in_thread() (which is a variant of consume_flattened() intended to be run inside a thread). compac_for_query is improved so that it can be reused by sstable compaction."	2016-07-04 15:07:47 +03:00
Avi Kivity	171054e87b	Merge seastar upstream * seastar d4d9e16...a47f893 (1): > Merge "overprovisioning support"	2016-07-04 13:46:03 +03:00
Paweł Dziepak	5d0de2179a	Merge "Adding scylla version API" from Amnon Amnon says: The API that returns the version, currently returns the compatibility version (e.g. the version the compatible origin version - currently 2.1.8). The check version functionality need to know what is the current running version of scylla. For that a new API was added that return the current version. The result is equivalent of running scylla --version. After this series a call to: $ curl -X GET "http://localhost:10000/storage_service/scylla_release_version" "666.development-20160703.72f0d4d" Which is the json representation of: $ ./build/release/scylla --version 666.development-20160703.72f0d4d	2016-07-04 10:52:44 +01:00
Asias He	f6a2672be0	storage_service: Modify log to match config option of scylla We currently log as follow: May 9 00:09:13 node3.nl scylla[2546]: [shard 0] storage_service - This node was decommissioned and will not rejoin the ring unless cassandra.override_decommission=true has been set,or all existing data is removed and the node is bootstrapped again Howerver, user should use override_decommission:true instead of cassandra.override_decommission:true in scylla.yaml where the cassandra prefix is stripped. Fixes #1240 Message-Id: <b0c9424c6922431ad049ab49391771e07ca6fbde.1467079190.git.asias@scylladb.com>	2016-07-04 10:47:49 +02:00
Avi Kivity	76cc0c0cf9	auth: fix performance problem when looking up permissions data_resource lookup uses data_resource::name(), which uses sprint(), which uses (indirectly) locale, which takes a global lock. This is a bottleneck on large machines. Fix by not using name() during lookup. Fixes #1419 Message-Id: <1467616296-17645-1-git-send-email-avi@scylladb.com>	2016-07-04 10:26:18 +02:00
Yoav Kleinberger	49cba035ea	tools/scyllatop: leave terminal in a functioning state when user quits with CTRL-C closes issue #1417. Signed-off-by: Yoav Kleinberger <yoav@scylladb.com> Message-Id: <1467556769-11851-1-git-send-email-yoav@scylladb.com>	2016-07-03 17:43:46 +03:00
Amnon Heiman	e66a1cd705	API: Add implementation for the scylla release version This adds the implementation to the scylla release version API. After this patch a call to: curl -X GET "http://localhost:10000/storage_service/scylla_release_version" Will return the current scylla release version. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2016-07-03 16:29:09 +03:00
Amnon Heiman	56ea8c943e	API: add scylla release version API This adds a definition to the scylla release version. The API already return the compatibility version (ie. the compatible origin version) This definition returns the scylla version, a call to the API should return the same result as running scylla --version. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2016-07-03 16:26:21 +03:00
Avi Kivity	68e613b313	Rebuild _column_family::_sstables when changing compaction_strategy The concrete sstable_set type depends on the compaction strategy, so ask the compaction_strategy to create a new sstable_set object and populate it.	2016-07-03 13:42:10 +03:00
Avi Kivity	44a6cef4e1	sstable mutation readers: use sstable_set::select() Apply compaction strategy specific logic to narrow down the set of sstables used for a query; can speed up reads using LeveledCompactionStrategy significantly. Fixes #1185.	2016-07-03 10:50:58 +03:00
Avi Kivity	4cb7618601	Convert column_family::_sstables to sstable_set Using sstable_set will allow us to filter sstables during a query before actually creating a reader (this is left to the next patch; here we just convert the users of the _sstables field).	2016-07-03 10:32:27 +03:00
Avi Kivity	c8237fc262	compaction_strategy: introduce make_sstable_set() Allow compaction_strategy to create a container for sstables that is optimized for the strategy. Most compaction_strategies return bag_sstable_set; leveled compaction returns the specialized partitioned_sstable_set.	2016-07-03 10:27:01 +03:00
Avi Kivity	168696c558	Introduce partitioned_sstable_set partitioned_sstable_set assumes that sstable are mostly partitioned along the token range: only a few sstables will be needed to access a particular token. It is implemented as an interval_map.	2016-07-03 10:27:00 +03:00
Avi Kivity	64e4357461	Introduce bag_sstable_set bag_sstable_set is a generic sstable_set implementation: it assumes nothing about the sstables. It is implemented as a vector, and any select will return the entire sstable set.	2016-07-03 10:27:00 +03:00
Avi Kivity	85e9cf4616	Introduce sstable_set sstable_set abstracts the notion of a container of sstables, allowing different compaction strategies to supply their own implementation. The intended user is leveled compaction strategy; since it partitions sstables, it can quickly restrict the number of sstables that participate in a query by looking at the min/max partition key. sstable_set also maintains an internal lw_shared_ptr<sstable_list>, in parallel with the abstract container. This is to support column_family::get_sstable(), which returns a lw_shared_ptr<sstable_list> which must be anchored somewhere if it is not saved at the caller side, as it isn't in most current callers.	2016-07-03 10:27:00 +03:00
Avi Kivity	c1815abd15	Introduce compatible_ring_position ring_position is built for modern code that does not require default constructors or stateless comparators. But not all code is modern, so supply a compatible_ring_position that works with old code, at the cost of some extra storage. Intended user is boost's interval container library.	2016-07-03 10:27:00 +03:00
Avi Kivity	2a46410f4a	Change sstable_list from a map to a set sstable_list is now a map<generation, sstable>; change it to a set in preparation for replacing it with sstable_set. The change simplifies a lot of code; the only casualty is the code that computes the highest generation number.	2016-07-03 10:26:57 +03:00
Duarte Nunes	386c0dd4b2	storage_proxy: Correctly calculate new limit This patch fixes a bug where we would always return query::max_rows when calculating the new limit for a retry read command. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1467289746-18177-1-git-send-email-duarte@scylladb.com>	2016-06-30 14:49:56 +02:00
Paweł Dziepak	b150720361	sstable: enable read ahead Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 13:18:24 +01:00
Paweł Dziepak	4513f8b52c	sstables: add compressed_file_data_source_impl::close() compressed_file_data_source_impl should close the underlying data source properly when asked to. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 13:07:07 +01:00
Paweł Dziepak	55a6911d7a	sstables: close input_stream<> properly If read ahead is going to be enabled it is important to close input_stream<> properly (and wait for completion) before destroying it. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00
Paweł Dziepak	e44e12c74a	sstables: drop no longer needed code Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00

1 2 3 4 5 ...

9791 Commits