release: prepare for 1.2.0

dist/common: Update scylla_io_setup to use settings done in cpuset.conf
scylla_io_setup is searching for --smp and --cpuset setting in SCYLLA_ARGS. We have moved the settings of this args into /etc/scylla.d/cpuset.conf and they are set by scylla_cpuset_setup into CPUSET. Fixes: #1327 Signed-off-by: Shlomi Livne <shlomi@scylladb.com> Message-Id: <2735e3abdd63d245ec96cfa1e65f766b1c12132e.1465508701.git.shlomi@scylladb.com> (cherry picked from commit ac6f2b5c13)
2016-06-13 15:18:13 +03:00 · 2016-06-10 09:38:17 +03:00 · 2016-06-09 10:11:29 +03:00 · 2016-06-08 11:05:47 +02:00 · 2016-06-07 10:43:30 +03:00 · 2016-06-07 09:44:26 +03:00
12 changed files with 46 additions and 33 deletions
--- a/2
+++ b/2
@@ -1,6 +1,6 @@
 #!/bin/sh

-VERSION=666.development
+VERSION=1.2.0

 if test -f version
 then
--- a/compaction_strategy.hh
+++ b/compaction_strategy.hh
@@ -51,6 +51,9 @@ public:
    // Return a list of sstables to be compacted after applying the strategy.
    compaction_descriptor get_sstables_for_compaction(column_family& cfs, std::vector<lw_shared_ptr<sstable>> candidates);

+    // Return if parallel compaction is allowed by strategy.
+    bool parallel_compaction() const;
+
    static sstring name(compaction_strategy_type type) {
        switch (type) {
        case compaction_strategy_type::null:
--- a/dist/ami/files/scylla-ami
+++ b/dist/ami/files/scylla-ami
--- a/dist/common/scripts/scylla_io_setup
+++ b/dist/common/scripts/scylla_io_setup
@@ -44,8 +44,8 @@ output_to_user()
 }

 if [ `is_developer_mode` -eq 0 ]; then
-    SMP=`echo $SCYLLA_ARGS|grep smp|sed -e "s/^.*smp\(\s\+\|=\)\([0-9]*\).*$/\2/"`
-    CPUSET=`echo $SCYLLA_ARGS|grep cpuset|sed -e "s/^.*\(--cpuset\(\s\+\|=\)[0-9\-]*\).*$/\1/"`
+    SMP=`echo $CPUSET|grep smp|sed -e "s/^.*smp\(\s\+\|=\)\([0-9]*\).*$/\2/"`
+    CPUSET=`echo $CPUSET|grep cpuset|sed -e "s/^.*\(--cpuset\(\s\+\|=\)[0-9\-]*\).*$/\1/"`
    if [ $AMI_OPT -eq 1 ]; then
        NR_CPU=`cat /proc/cpuinfo |grep processor|wc -l`
        NR_DISKS=`lsblk --list --nodeps --noheadings | grep -v xvda | grep xvd | wc -l`
--- a/main.cc
+++ b/main.cc
@@ -595,10 +595,10 @@ int main(int ac, char** av) {
            supervisor_notify("serving");
            // Register at_exit last, so that storage_service::drain_on_shutdown will be called first
            engine().at_exit([] {
-                return service::get_local_storage_service().drain_on_shutdown();
+                return repair_shutdown(service::get_local_storage_service().db());
            });
            engine().at_exit([] {
-                return repair_shutdown(service::get_local_storage_service().db());
+                return service::get_local_storage_service().drain_on_shutdown();
            });
            engine().at_exit([&db] {
                return db.invoke_on_all([](auto& db) {
--- a/service/storage_proxy.cc
+++ b/service/storage_proxy.cc
@@ -2058,7 +2058,6 @@ public:
                        auto write_timeout = exec->_proxy->_db.local().get_config().write_request_timeout_in_ms() * 1000;
                        auto delta = __int128_t(digest_resolver->last_modified()) - __int128_t(exec->_cmd->read_timestamp);
                        if (std::abs(delta) <= write_timeout) {
-                            print("HERE %d\n", int64_t(delta));
                            exec->_proxy->_stats.global_read_repairs_canceled_due_to_concurrent_write++;
                            // if CL is local and non matching data is modified less then write_timeout ms ago do only local repair
                            auto i = boost::range::remove_if(exec->_targets, std::not1(std::cref(db::is_local)));
--- a/sstables/compaction_manager.cc
+++ b/sstables/compaction_manager.cc
@@ -83,13 +83,17 @@ int compaction_manager::trim_to_compact(column_family* cf, sstables::compaction_
    return weight;
 }

-bool compaction_manager::try_to_register_weight(column_family* cf, int weight) {
+bool compaction_manager::try_to_register_weight(column_family* cf, int weight, bool parallel_compaction) {
    auto it = _weight_tracker.find(cf);
    if (it == _weight_tracker.end()) {
        _weight_tracker.insert({cf, {weight}});
        return true;
    }
    std::unordered_set<int>& s = it->second;
+    // Only one weight is allowed if parallel compaction is disabled.
+    if (!parallel_compaction && !s.empty()) {
+        return false;
+    }
    // TODO: Maybe allow only *smaller* compactions to start? That can be done
    // by returning true only if weight is not in the set and is lower than any
    // entry in the set.
@@ -164,8 +168,7 @@ lw_shared_ptr<compaction_manager::task> compaction_manager::task_start(column_fa
                sstables::compaction_strategy cs = cf.get_compaction_strategy();
                descriptor = cs.get_sstables_for_compaction(cf, std::move(candidates));
                weight = trim_to_compact(&cf, descriptor);
-                if (!try_to_register_weight(&cf, weight)) {
-                    // Refusing compaction job because of an ongoing compaction with same weight.
+                if (!try_to_register_weight(&cf, weight, cs.parallel_compaction())) {
                    task->stopping = true;
                    _stats.pending_tasks--;
                    cmlog.debug("Refused compaction job ({} sstable(s)) of weight {} for {}.{}",
--- a/sstables/compaction_manager.hh
+++ b/sstables/compaction_manager.hh
@@ -81,9 +81,9 @@ private:
    // It will not accept new requests in case the manager was stopped.
    bool can_submit();

-    // If weight is not taken for the column family, weight is registered and
-    // true is returned. Return false otherwise.
-    bool try_to_register_weight(column_family* cf, int weight);
+    // Return true if weight is not registered. If parallel_compaction is not
+    // true, only one weight is allowed to be registered.
+    bool try_to_register_weight(column_family* cf, int weight, bool parallel_compaction);
    // Deregister weight for a column family.
    void deregister_weight(column_family* cf, int weight);

--- a/sstables/compaction_strategy.cc
+++ b/sstables/compaction_strategy.cc
@@ -56,6 +56,9 @@ public:
    virtual ~compaction_strategy_impl() {}
    virtual compaction_descriptor get_sstables_for_compaction(column_family& cfs, std::vector<sstables::shared_sstable> candidates) = 0;
    virtual compaction_strategy_type type() const = 0;
+    virtual bool parallel_compaction() const {
+        return true;
+    }
 };

 //
@@ -402,6 +405,10 @@ public:

    virtual compaction_descriptor get_sstables_for_compaction(column_family& cfs, std::vector<sstables::shared_sstable> candidates) override;

+    virtual bool parallel_compaction() const override {
+        return false;
+    }
+
    virtual compaction_strategy_type type() const {
        return compaction_strategy_type::leveled;
    }
@@ -439,6 +446,9 @@ compaction_strategy_type compaction_strategy::type() const {
 compaction_descriptor compaction_strategy::get_sstables_for_compaction(column_family& cfs, std::vector<sstables::shared_sstable> candidates) {
    return _compaction_strategy_impl->get_sstables_for_compaction(cfs, std::move(candidates));
 }
+bool compaction_strategy::parallel_compaction() const {
+    return _compaction_strategy_impl->parallel_compaction();
+}

 compaction_strategy make_compaction_strategy(compaction_strategy_type strategy, const std::map<sstring, sstring>& options) {
    ::shared_ptr<compaction_strategy_impl> impl;
--- a/sstables/leveled_manifest.hh
+++ b/sstables/leveled_manifest.hh
@@ -175,10 +175,8 @@ public:

            if (previous != nullptr && current_first.tri_compare(s, previous->get_last_decorated_key(s)) <= 0) {

-                logger.warn("At level {}, {} [{}, {}] overlaps {} [{}, {}].  This could be caused by a bug in Cassandra 1.1.0 .. 1.1.3 " \
-                    "or due to the fact that you have dropped sstables from another node into the data directory. " \
-                    "Sending back to L0. If you didn't drop in sstables, and have not yet run scrub, you should do so since you may also " \
-                    "have rows out-of-order within an sstable",
+                logger.warn("At level {}, {} [{}, {}] overlaps {} [{}, {}]. This could be caused by the fact that you have dropped " \
+                    "sstables from another node into the data directory. Sending back to L0.",
                    level, previous->get_filename(), previous->get_first_partition_key(s), previous->get_last_partition_key(s),
                    current->get_filename(), current->get_first_partition_key(s), current->get_last_partition_key(s));

--- a/streaming/stream_transfer_task.cc
+++ b/streaming/stream_transfer_task.cc
@@ -85,7 +85,6 @@ struct send_info {
 };

 future<stop_iteration> do_send_mutations(auto si, auto fm) {
-    return get_local_stream_manager().mutation_send_limiter().wait().then([si, fm = std::move(fm)] () mutable {
        sslog.debug("[Stream #{}] SEND STREAM_MUTATION to {}, cf_id={}", si->plan_id, si->id, si->cf_id);
        auto fm_size = fm.representation().size();
        net::get_local_messaging_service().send_stream_mutation(si->id, si->plan_id, std::move(fm), si->dst_cpu_id).then([si, fm_size] {
@@ -100,26 +99,27 @@ future<stop_iteration> do_send_mutations(auto si, auto fm) {
                sslog.error("[Stream #{}] stream_transfer_task: Fail to send STREAM_MUTATION to {}: {}", si->plan_id, si->id, ep);
            }
            si->mutations_done.broken();
-        }).finally([] {
-            get_local_stream_manager().mutation_send_limiter().signal();
        });
-        return stop_iteration::no;
-    });
+        return make_ready_future<stop_iteration>(stop_iteration::no);
 }

 future<> send_mutations(auto si) {
    auto& cf = si->db.find_column_family(si->cf_id);
    auto& priority = service::get_local_streaming_read_priority();
    return do_with(cf.make_reader(cf.schema(), si->pr, query::no_clustering_key_filtering, priority), [si] (auto& reader) {
-        return repeat([si, &reader] () {
-            return reader().then([si] (auto mopt) {
-                if (mopt && si->db.column_family_exists(si->cf_id)) {
-                    si->mutations_nr++;
-                    auto fm = frozen_mutation(*mopt);
-                    return do_send_mutations(si, std::move(fm));
-                } else {
-                    return make_ready_future<stop_iteration>(stop_iteration::yes);
-                }
+        return repeat([si, &reader] {
+            return get_local_stream_manager().mutation_send_limiter().wait().then([si, &reader] {
+                return reader().then([si] (auto mopt) {
+                    if (mopt && si->db.column_family_exists(si->cf_id)) {
+                        si->mutations_nr++;
+                        auto fm = frozen_mutation(*mopt);
+                        return do_send_mutations(si, std::move(fm));
+                    } else {
+                        return make_ready_future<stop_iteration>(stop_iteration::yes);
+                    }
+                });
+            }).finally([] {
+                get_local_stream_manager().mutation_send_limiter().signal();
            });
        });
    }).then([si] {
@@ -132,7 +132,7 @@ void stream_transfer_task::start() {
    auto cf_id = this->cf_id;
    auto id = net::messaging_service::msg_addr{session->peer, session->dst_cpu_id};
    sslog.debug("[Stream #{}] stream_transfer_task: cf_id={}", plan_id, cf_id);
-    parallel_for_each(_ranges.begin(), _ranges.end(), [this, plan_id, cf_id, id] (auto range) {
+    do_for_each(_ranges.begin(), _ranges.end(), [this, plan_id, cf_id, id] (auto range) {
        unsigned shard_begin = range.start() ? dht::shard_of(range.start()->value()) : 0;
        unsigned shard_end = range.end() ? dht::shard_of(range.end()->value()) + 1 : smp::count;
        auto cf_id = this->cf_id;
--- a/utils/histogram.hh
+++ b/utils/histogram.hh
@@ -196,7 +196,7 @@ inline ihistogram operator +(ihistogram a, const ihistogram& b) {
 struct rate_moving_average {
    uint64_t count = 0;
    double rates[3] = {0};
-    double mean_rate;
+    double mean_rate = 0;
    rate_moving_average& operator +=(const rate_moving_average& o) {
        count += o.count;
        mean_rate += o.mean_rate;
Author	SHA1	Message	Date
Pekka Enberg	c384b23112	release: prepare for 1.2.0	2016-06-13 15:18:13 +03:00
Shlomi Livne	3688542323	dist/common: Update scylla_io_setup to use settings done in cpuset.conf scylla_io_setup is searching for --smp and --cpuset setting in SCYLLA_ARGS. We have moved the settings of this args into /etc/scylla.d/cpuset.conf and they are set by scylla_cpuset_setup into CPUSET. Fixes: #1327 Signed-off-by: Shlomi Livne <shlomi@scylladb.com> Message-Id: <2735e3abdd63d245ec96cfa1e65f766b1c12132e.1465508701.git.shlomi@scylladb.com> (cherry picked from commit `ac6f2b5c13`)	2016-06-10 09:38:17 +03:00
Pekka Enberg	7916182cfa	Revert "Be more conservative when deciding when to shut down due to disk errors" This reverts commit `a6179476c5`. The change breaks 'nodetool snapshot', for example.	2016-06-09 10:11:29 +03:00
Tomasz Grabiec	ec1fd3945f	Revert "config: adjust boost::program_options validator to work with db::string_map" This reverts commit `653e250d04`. Compiletion is broken with this patch: [155/264] CXX build/release/db/config.o FAILED: g++ -MMD -MT build/release/db/config.o -MF build/release/db/config.o.d -std=gnu++1y -g -Wall -Werror -fvisibility=hidden -pthread -I/home/shlomi/scylla/seastar -I/home/shlomi/scylla/seastar/build/release/gen -march=nehalem -Wno-overloaded-virtual -DHAVE_HWLOC -DHAVE_NUMA -O2 -I/usr/include/jsoncpp/ -Wno-maybe-uninitialized -DHAVE_LIBSYSTEMD=1 -I. -I build/release/gen -I seastar -I seastar/build/release/gen -c -o build/release/db/config.o db/config.cc db/config.cc:57:13: error: ‘void db::validate(boost::any&, const std::vector<std::__cxx11::basic_string<char> >&, db::string_map*, int)’ defined but not used [-Werror=unused-function] static void validate(boost::any& out, const std::vector<std::string>& in, ^ cc1plus: all warnings being treated as errors This branch doesn't have commits which introduce the problem which this patch fixes, so let's just revert it.	2016-06-08 11:05:47 +02:00
Gleb Natapov	653e250d04	config: adjust boost::program_options validator to work with db::string_map Fixes #1320 Message-Id: <20160607064511.GX9939@scylladb.com> (cherry picked from commit `9635e67a84`)	2016-06-07 10:43:30 +03:00
Amnon Heiman	6255076c20	rate_moving_average: mean_rate is not initilized The rate_moving_average is used by timed_rate_moving_average to return its internal values. If there are no timed event, the mean_rate is not propertly initilized. To solve that the mean_rate is now initilized to 0 in the structure definition. Refs #1306 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1465231006-7081-1-git-send-email-amnon@scylladb.com> (cherry picked from commit `2cf882c365`)	2016-06-07 09:44:26 +03:00
Pekka Enberg	420ebe28fd	release: prepare for 1.2.rc2	2016-06-06 16:17:26 +03:00
Avi Kivity	a6179476c5	Be more conservative when deciding when to shut down due to disk errors Currently we only shut down on EIO. Expand this to shut down on any system_error. This may cause us to shut down prematurely due to a transient error, but this is better than not shutting down due to a permanent error (such as ENOSPC or EPERM). We may whitelist certain errors in the future to improve the behavior. Fixes #1311. Message-Id: <1465136956-1352-1-git-send-email-avi@scylladb.com> (cherry picked from commit `961e80ab74`)	2016-06-06 16:15:25 +03:00
Raphael S. Carvalho	342726a23c	compaction: leveled: improve log message for overlapping table Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <2dcbe3c8131f1d88a3536daa0b6cdd25c6e41d76.1464883077.git.raphaelsc@scylladb.com> (cherry picked from commit `17b56eb459`)	2016-06-06 16:13:40 +03:00
Raphael S. Carvalho	e9946032f4	compaction: disable parallel compaction for leveled strategy It was discussed that leveled strategy may not benefit from parallel compaction feature because almost all compaction jobs will have similar size. It was also found that leveled strategy wasn't working correctly with it because two overlapping sstable (targetting the same level) could be created in parallel by two ongoing compaction. Fixes #1293. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <60fe165d611c0283ca203c6d3aa2662ab091e363.1464883077.git.raphaelsc@scylladb.com> (cherry picked from commit `588ce915d6`)	2016-06-06 16:13:36 +03:00
Pekka Enberg	5e0b113732	Update scylla-ami submodule * dist/ami/files/scylla-ami 72ae258...863cc45 (3): > Move --cpuset/--smp parameter settings from scylla_sysconfig_setup to scylla_ami_setup > convert scylla_install_ami to bash script > 'sh -x -e' is not valid since all scripts converted to bash script, so remove them	2016-06-06 13:38:53 +03:00
Asias He	c70faa4f23	streaming: Reduce memory usage when sending mutations Limit disk bandwidth to 5MB/s to emulate a slow disk: echo "8:0 5000000" > /cgroup/blkio/limit/blkio.throttle.write_bps_device echo "8:0 5000000" > /cgroup/blkio/limit/blkio.throttle.read_bps_device Start scylla node 1 with low memory: scylla -c 1 -m 128M --auto-bootstrap false Run c-s: taskset -c 7 cassandra-stress write duration=5m cl=ONE -schema 'replication(factor=1)' -pop seq=1..100000 -rate threads=20 limit=2000/s -node 127.0.0.1 Start scylla node 2 with low memory: scylla -c 1 -m 128M --auto-bootstrap true Without this patch, I saw std::bad_alloc during streaming ERROR 2016-06-01 14:31:00,196 [shard 0] storage_proxy - exception during mutation write to 127.0.0.1: std::bad_alloc (std::bad_alloc) ... ERROR 2016-06-01 14:31:10,172 [shard 0] database - failed to move memtable to cache: std::bad_alloc (std::bad_alloc) ... To fix: 1. Apply the streaming mutation limiter before we read the mutation into memory to avoid wasting memory holding the mutation which we can not send. 2. Reduce the parallelism of sending streaming mutations. Before we send each range in parallel, after we send each range one by one. before: nr_vnode * nr_shard * (send_info + cf.make_reader memory usage) after: nr_shard * (send_info + cf.make_reader memory usage) We can at least save memory usage by the factor of nr_vnode, 256 by default. In my setup, fix 1) alone is not enough, with both fix 1) and 2), I saw no std::bad_alloc. Also, I did not see streaming bandwidth dropped due to 2). In addition, I tested grow_cluster_test.py:GrowClusterTest.test_grow_3_to_4, as described: https://github.com/scylladb/scylla/issues/1270#issuecomment-222585375 With this patch, I saw no std::bad_alloc any more. Fixes: #1270 Message-Id: <7703cf7a9db40e53a87f0f7b5acbb03fff2daf43.1464785542.git.asias@scylladb.com> (cherry picked from commit `206955e47c`)	2016-06-02 11:18:59 +03:00
Gleb Natapov	15ad4c9033	storage_proxy: drop debug output Message-Id: <20160601132641.GK2381@scylladb.com> (cherry picked from commit `26b50eb8f4`)	2016-06-01 17:14:32 +03:00
Pekka Enberg	d094329b6e	Revert "Revert "main: change order between storage service and drain execution during exit"" This reverts commit `b3ed55be1d`. The issue is in the failing dtest, not this commit. Gleb writes: "The bug is in the test, not the patch. Test waits for repair session to end one way or the other when node is killed, but for nodetool to know if repair is completed it needs to poll for it. If node dies before nodetool managed to see repair completion it will stuck forever since jmx is alive, but does not provide answers any more. The patch changes timing, repair is completed much close to exit now, so problem appears, but it may happen even without the patch. The fix is for dtest to kill jmx as part of killing a node operation." Now that Lucas fixed the problem in scylla-ccm, revert the revert. (cherry picked from commit `0255318bf3`)	2016-06-01 08:51:51 +03:00
Pekka Enberg	dcab915f21	release: prepare for 1.2.rc1	2016-05-30 13:14:38 +03:00