scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 12:36:56 +00:00

Author	SHA1	Message	Date
Avi Kivity	21031d276b	Merge seastar upstream * seastar c82c36f...9267dfa (6): > app_template: Make run() wait for func when reactor exit is triggered externally > core: Introduce futurize_apply() helper > rpc: make unexpected eof messages more informative > Fix boost version check > reactor: more fix for smp poll with older boost > reactor: fix build on older boost due to spsc_queue::read_available()	2016-07-06 18:14:13 +03:00
Avi Kivity	02530faeb2	compaction: fix tombstones not being garbage collected during compaction `2a46410f4a` changed sstable_list from a map to a set, so it is no longer sorted by generation. The code for finding the list of sstables not being compacted relied on this sort order, and now broke, returning a longer list than needed (including some of the sstables being compacted). As a result, the compaction code preserved the tombstones, incorrectly thinking there was still live data they referenced. Fix by sorting the set explicitly. Fixes #1429. Message-Id: <1467793026-6571-1-git-send-email-avi@scylladb.com>	2016-07-06 10:22:31 +02:00
Asias He	0c56bbe793	gossip: Make get_supported_features and wait_for_feature_on{_all}_node private They are used only inside gossiper itself. Also make the helper get_supported_features(std::unordered_map<gms::inet_address, sstring>) static. Message-Id: <f434c145ad9138084708b60c1d959b84360e47b2.1467775291.git.asias@scylladb.com>	2016-07-06 09:54:56 +03:00
Avi Kivity	ab279a4752	Merge "Add support to date tiered compaction strategy" from Raphael "After this patchset, date tiered compaction strategy is supported by Scylla. For those who don't know what it is about, the following article may help: https://labs.spotify.com/2014/12/18/date-tiered-compaction/ It's also nicely explained here by our wiki page: https://github.com/scylladb/scylla/wiki/SSTable-compaction#date-tiered-compaction Basically, date tiered strategy was developed to help the database perform better when facing a time series workload. Date tiered strategy will work to keep data written at nearly the same time together, such that the number of relevant sstables for a time-based query is relatively low. We still lacks support to filter out sstables based on time parameters of a query, but that feature should come ASAP. The following dtests now pass: compaction_test.py:TestCompaction_with_DateTieredCompactionStrategy.compaction_delete_test compaction_test.py:TestCompaction_with_DateTieredCompactionStrategy.compaction_strategy_switching_test Used cassandra-stress with the parameter '-schema compaction$strategy=DateTieredCompactionStrategy$' to check stability. Fixes #511."	2016-07-06 09:51:12 +03:00
Avi Kivity	7438c9de5c	Merge "Fix database freeze with load for multiple CFs" from Glauber "Issue 1195 describes a scenario with a fairly easy reproducer in which we can freeze the database. That involves writing simultaneously to multiple CFs, such that the sum of all the memory they are using is larger than the dirty memory limit, without not any of them individually being larger than the memtable size. This patchset rewrites the throttling code, including now active flushes so that this situation cannot happen. Fixes #1195"	2016-07-06 09:48:13 +03:00
Raphael S. Carvalho	b5ec4d46c6	tests: add test for date tiered compaction strategy Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 02:11:47 -03:00
Raphael S. Carvalho	b699ef2de3	compaction: wire up date tiered compaction strategy After this commit, date tiered compaction strategy is supported on Scylla. To understand how it works, take a look at our wiki page: https://github.com/scylladb/scylla/wiki/SSTable-compaction#date-tiered-compaction Fixes #511. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 02:11:47 -03:00
Raphael S. Carvalho	e5cc0cc6c4	compaction: implement date tiered compaction strategy This commit is basically about converting Java to C++. Date tiered compaction strategy isn't wired yet. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 02:11:47 -03:00
Raphael S. Carvalho	cab2892866	tests: add test for sstables::get_fully_expired_sstables Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 02:11:47 -03:00
Raphael S. Carvalho	e9076f39be	compaction: implement function to get fully expired sstables Strongly based on org.apache.cassandra.db.compaction. CompactionController.getFullyExpiredSSTables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 02:11:47 -03:00
Raphael S. Carvalho	69b3860662	tests: add test for leveled_manifest::overlapping Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 02:11:45 -03:00
Raphael S. Carvalho	92848efc42	sstables: make overlapping functions static That's needed for a function that will get overlapping sstables to get fully expired ones. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 01:34:34 -03:00
Raphael S. Carvalho	8d38fa49d4	sstables: move code to get uncompacting sstables to a function Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 01:33:55 -03:00
Raphael S. Carvalho	1118cfc51a	tests: test that sstable max_local_deletion_time is properly updated Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 01:13:34 -03:00
Raphael S. Carvalho	cc6c383249	sstables: properly keep track of max local deletion time We weren't updating max local deletion time for cells that contain ttl, or for tombstone cells. If there is a live cell with no ttl, then max local deletion time is supposed to store maximum value, which means that the sstable will not be fully expired later on. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 01:13:24 -03:00
Raphael S. Carvalho	1ecd9bdefc	sstables: fix type of max_local_deletion_time max_local_deletion_time was incorrectly using an unsigned type instead of a signed one. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 01:13:13 -03:00
Raphael S. Carvalho	f9ab94d266	compaction: import DateTieredCompactionStrategy.java File can be found at the following C* directory: src/java/org/apache/cassandra/db/compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-07-06 01:12:49 -03:00
Glauber Costa	b0932ceb04	database: act on LSA pressure notification Issue 1195 describes a scenario with a fairly easy reproducer in which we can freeze the database. That involves writing simultaneously to multiple CFs, such that the sum of all the memory they are using is larger than the dirty memory limit, without not any of them individually being larger than the memtable size. Because we will never reach the individual memtable seal size for any of them, none of them will initiate a flush leading the database to a halt. The LSA has now gained infrastructure that allow us to be notified when pressure conditions mount. What we will do in this case is initiate a flush ourselves. Fixes #1195 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-07-05 17:46:28 -04:00
Glauber Costa	7169b727ea	move system tables to its own region In the spirit of what we are doing for the read semaphore, this patch moves system writes to its own dirty memory manager. Not only will it make sure that system tables will not be serialized by its own semaphore, but it will also put system tables in its own region group. Moving system tables to its own region group has the advantage that system requests won't be waiting during throttle behind a potentially big queue of user requests, since requests are tended to in FIFO order within the same region group. However, system tables being more controlled and predictable, we can actually go a step further and give them some extra reservation so they may not necessarily block even if under pressure (up to 10 MB more). Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-07-05 17:46:28 -04:00
Glauber Costa	c358947284	database: wrap semaphore and region group into a new dirty memory manager We currently have a semaphore in the column family level that protects us against multiple concurrent sstable flushes. However, storing that semaphore into the CF, not the database, was a (implementation, not design) mistake. One comment in particular makes it quite clear: // Ideally, we'd allow one memtable flush per shard (or per database object), and write-behind // would take care of the rest. But that still has issues, so we'll limit parallelism to some // number (4), that we will hopefully reduce to 1 when write behind works. So I aimed for the shard, but ended up coding it into the CF because that's closer to the flush point - my bad. This patch fixes this while paving the way for active reclaim to take place. It wraps the semaphore and the region group in a new structure, the dirty_memory_manager. The immediate benefit is that we don't need to be passing both the semaphore and the region group downwards in the DB -> CF path. The long term benefit is that we now have a one unified structure that can hold shared flush data in all of the CFs. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-07-05 15:29:04 -04:00
Glauber Costa	d41fcd45d1	memtables: make memtable inherit from region The LSA memory pressure mechanism will let us know which region is the best candidate for eviction when under pressure. We need to somehow then translate region -> memtable -> column family. The easiest way to convert from region to memtable, is having memtable inherit from region. Despite the fact that this requires multiple inheritance, which always raise a flag a bit, the other class we inherit from is enable_shared_from_this, which has a very simple and well defined interface. So I think it is worthy for us to do it. Once we have the memtable, grabing the column family is easy provided we have a database object. We can grab it from the schema. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-07-05 15:05:29 -04:00
Glauber Costa	0c31f3e626	database: move memtable throttler to the LSA throttler The LSA infrastructure, through the use of its region groups, now have a throttler mechanism built-in. This patch converts the current throttlers so that the LSA throttler is used instead. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-07-05 15:05:19 -04:00
Yoav Kleinberger	0ad940bfc3	tools/scyllatop: fix crash due to mouse events due to an urwid-library technicality, some mouse events like scroll or click would crash scyllatop. This patch fixes this problem. closes issue #1396. Signed-off-by: Yoav Kleinberger <yoav@scylladb.com> Message-Id: <1467294117-19218-1-git-send-email-yoav@scylladb.com>	2016-07-05 19:08:55 +03:00
Avi Kivity	cb59e724ee	Merge "Fix enabling sstable read ahead" from Paweł "This series contains remaining changes necessary to safely enable read ahead of sstables. Basically, it makes sure that input_streams are always properly closed (even in case of exception during read)."	2016-07-05 19:04:19 +03:00
Raphael S. Carvalho	e688fc9550	api: provide estimation of pending compaction Use compaction_strategy::estimated_pending_compaction() to provide user with an estimation of number of compaction for strategy to be fully satisfied. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <39b7d91f2525ca38fb2ce9d8885d0c2e727de7ed.1467667054.git.raphaelsc@scylladb.com>	2016-07-05 19:03:12 +03:00
Raphael S. Carvalho	43926026c3	compaction: introduce compaction strategy method to estimate pending compaction At the moment, it's not possible to know how many compaction are needed for compaction strategy to be satisfied. It's not possible to know exactly the number of pending compaction, but the strategy can provide an estimation. For size tiered, it's based on number of sstables in each bucket. By dividing bucket size by max threshold, we get number of compaction needed to compact that single bucket. For leveled, it's about the number of sstables that exceeds the limit in each level. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <e209e52f6159ee274a8358b69961a7c0ce357f7d.1467667054.git.raphaelsc@scylladb.com>	2016-07-05 19:03:11 +03:00
Avi Kivity	76cc6408cd	Merge "feature check for seed node" from Asias ""This series implemnts feature check for seed node.	2016-07-05 19:01:01 +03:00
Asias He	6f69963ef9	system_keyspace: Simplify load_host_ids implementation - Use plain loop instead of do_for_each - Use row.get_as() instead of row.template get_as() Message-Id: <3e108d3a6258c0caaf569eb9c79532d9789ea411.1467703722.git.asias@scylladb.com>	2016-07-05 09:47:21 +02:00
Asias He	3f31be58b6	system_keyspace: Simplify load_tokens implemntation - Use plain loop instead of do_for_each - Use row.get_as() instead of row.template get_as() Message-Id: <f959ace4f30078695d383c849ed4520169228f97.1467703722.git.asias@scylladb.com>	2016-07-05 09:47:21 +02:00
Asias He	5236e7a379	storage_service: Implement feature check for seed node Checking features for seed node is a bit more complicated than non-seed node, because non-seed node can always talk to at least one seed node, seed node may not. In this patch, we distingush new cluster and existing cluster by checking if the system table is empty. We relax the feature check for new cluster because the feature check is mostly useful when upgrading an existing cluster to prevent old node to join new cluster. When talking to a seed node failed during the check, we fallback to the check using features stored in the system table. This makes restarting a seed node when no other seed node is up possible (no other seed node at all, or other seed node is not up yet). I tested the following scenarios. 1) start a completely new seed node in a new cluster * system table is empty, skip the check. 2) start a cluster, restart one seed node, at least one other seed node is up * system table is not empty, check with shadow round, shadow round will * succeed 3) start a cluster, restart one seed node, no other seed node is up * system table is not empty, check with shadow round, shadow round will * fail, fallback to system table check. 4) start a cluster, shutdown all the nodes, start one seed node with new ip address, seed list in yaml is updated with new ip address * system table is not empty, check with shadow round, shadow round will * fail, fallback to system table check	2016-07-05 10:09:54 +08:00
Asias He	bb80362c3f	gossip: Insert with result.end() in get_supported_features It is faster than result.begin(), suggested by Avi.	2016-07-05 10:09:54 +08:00
Asias He	72cb4a228b	gossip: Add to_feature_set helper To convert a "," split feature string to a feature set.	2016-07-05 10:09:54 +08:00
Asias He	1d6c57fb40	gossip: Reduce timeout in shadow round In `3a36ec33db` (gossip: Wait longer for seed node during boot up), we increased the timeout by the factor of 60, i.e., ring_dealy * 60 = 5 seconds * 60 = 5 minutes. In `57ee9676c2` (storage_service: Fix default ring_delay time), we fixed the default ring_dealy to 30 seconds. Now the timeout is 30 * 60 seconds = 30 minutes, which is too long. Make it 5 minues.	2016-07-05 10:09:54 +08:00
Asias He	88f0bb3a7b	gossip: Add check_knows_remote_features To check if this node knows features in std::unordered_map<inet_address, sstring> peer_features_string	2016-07-05 10:09:54 +08:00
Asias He	2b53c50c15	gossip: Add get_supported_features To get features supported by all the nodes listed in the address/feature map.	2016-07-05 10:09:53 +08:00
Asias He	31df4e5316	system_keyspace: Introduce load_peer_features To get the peer features stored in the system.peers table.	2016-07-05 10:09:53 +08:00
Paweł Dziepak	4acf77d755	sstables: drop unused data_stream_at() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-04 18:17:43 +01:00
Paweł Dziepak	2cdf498bbd	sstables: close input stream in sstable::data_read() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-04 18:17:42 +01:00
Paweł Dziepak	8931b939a1	sstables: use finally() to close input streams Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-04 18:17:42 +01:00
Paweł Dziepak	e6ececce7f	Merge seastar upstream Submodule seastar a47f893..c82c36f: > reactor: fix build error > util: lazy_eval: fix compilation errors related to operator<<()s definitions	2016-07-04 18:14:05 +01:00
Duarte Nunes	41843b32c5	thrift: Correctly mark a CF as dense And store whether the comparator is a composite type in the case of dynamic CFs. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1467307688-11059-1-git-send-email-duarte@scylladb.com>	2016-07-04 17:40:53 +02:00
Nadav Har'El	c4e871ea2d	Work around unexpected data_value constructor If someone tried to naively use utf8_type->decompose("18wX"), this would mysteriously fail, returning an empty key. decompose takes a data_value, so the compiler looked for an implict conversion from the string constant (const char) to data_value. We did not have such a conversion, only conversion from sstring. But the compiler chose (backed by the C++ standard, no doubt) to implicitly convert the const char to a bool (!), and then use data_value(bool). It did not convert the const char* to an sstring, nor did it warn about the possible ambiguity. So this patch adds a data_value(const char*) constructor, so people will not fall into the same trap that I fell into... Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1467643462-6349-1-git-send-email-nyh@scylladb.com>	2016-07-04 17:50:53 +03:00
Avi Kivity	e22517bafc	Merge "Optimize reads from leveled sstables" In a leveled column family, there can be many thousands of sstables, since each sstable is limited to a relatively small size (160M by default). With the current approach of reading from all sstables in parallel, cpu quickly becomes a bottleneck as we need to check the bloom filter for each of these sstables. This patch addresses the problem by introducing a compaction-strategy-specific data structure for holding sstables. This data structure has a method to obtain the sstables used for a read. For leveled compaction strategy, this data structure is an interval map, which can be efficiently used to select the right sstables.	2016-07-04 16:00:35 +03:00
Asias He	610a0f7ef0	storage_service: Skip feature check for seed node for now When a seed node boots up with more than one node in the seed list, it will fail to talk to the other seed node which is not up yet. This fails the feature check, so the seed node will not boot. Skip the feature check for seed node for now, util we have a proper solution. Fixes recent dtest failure due to fail to boot the seed node. Message-Id: <e1d4110f96817e45f81dc0bc948dd14600fc5333.1467251799.git.asias@scylladb.com>	2016-07-04 15:09:57 +03:00
Avi Kivity	28fab55e6e	Merge "Convert sstable writes to streamed mutations" from Paweł "This series converts sstable writers (including compaction) to streamed mutations and makes them use consumer-style interface. Code related to sstable writes and compaction is converted to consumers that can be used with consume_flattened_in_thread() (which is a variant of consume_flattened() intended to be run inside a thread). compac_for_query is improved so that it can be reused by sstable compaction."	2016-07-04 15:07:47 +03:00
Avi Kivity	171054e87b	Merge seastar upstream * seastar d4d9e16...a47f893 (1): > Merge "overprovisioning support"	2016-07-04 13:46:03 +03:00
Paweł Dziepak	5d0de2179a	Merge "Adding scylla version API" from Amnon Amnon says: The API that returns the version, currently returns the compatibility version (e.g. the version the compatible origin version - currently 2.1.8). The check version functionality need to know what is the current running version of scylla. For that a new API was added that return the current version. The result is equivalent of running scylla --version. After this series a call to: $ curl -X GET "http://localhost:10000/storage_service/scylla_release_version" "666.development-20160703.72f0d4d" Which is the json representation of: $ ./build/release/scylla --version 666.development-20160703.72f0d4d	2016-07-04 10:52:44 +01:00
Asias He	f6a2672be0	storage_service: Modify log to match config option of scylla We currently log as follow: May 9 00:09:13 node3.nl scylla[2546]: [shard 0] storage_service - This node was decommissioned and will not rejoin the ring unless cassandra.override_decommission=true has been set,or all existing data is removed and the node is bootstrapped again Howerver, user should use override_decommission:true instead of cassandra.override_decommission:true in scylla.yaml where the cassandra prefix is stripped. Fixes #1240 Message-Id: <b0c9424c6922431ad049ab49391771e07ca6fbde.1467079190.git.asias@scylladb.com>	2016-07-04 10:47:49 +02:00
Avi Kivity	76cc0c0cf9	auth: fix performance problem when looking up permissions data_resource lookup uses data_resource::name(), which uses sprint(), which uses (indirectly) locale, which takes a global lock. This is a bottleneck on large machines. Fix by not using name() during lookup. Fixes #1419 Message-Id: <1467616296-17645-1-git-send-email-avi@scylladb.com>	2016-07-04 10:26:18 +02:00
Yoav Kleinberger	49cba035ea	tools/scyllatop: leave terminal in a functioning state when user quits with CTRL-C closes issue #1417. Signed-off-by: Yoav Kleinberger <yoav@scylladb.com> Message-Id: <1467556769-11851-1-git-send-email-yoav@scylladb.com>	2016-07-03 17:43:46 +03:00

1 2 3 4 5 ...

9807 Commits