Commit Graph

12837 Commits

Author SHA1 Message Date
Asias He
49360992d9 storage_service: Use the new range_streamer interface for removenode
So that removenode operation will now stream small ranges at a time and
restream the failed ranges.
2017-08-07 16:31:48 +08:00
Asias He
6b8dc85f12 storage_service: Use the new range_streamer interface for decommission
So that decommission operation will now stream small ranges at a time and
restream the failed ranges.
2017-08-07 16:31:48 +08:00
Asias He
24584b8509 storage_service: Use the new range_streamer interface for rebuild
So that rebuild operation will now stream small ranges at a time and
restream the failed ranges.
2017-08-07 16:31:47 +08:00
Asias He
f239b11a84 storage_service: Use the new range_streamer interface for bootstrap
So that bootstrap operation will now stream small ranges at a time and
restream the failed ranges.
2017-08-07 16:31:47 +08:00
Asias He
6810031ba7 dht: Extend range_streamer interface
After this patch and the following patches to use the new
range_streamder interface, all the following cluster operations:

- bootstrap
- rebuild
- decommission
- removenode

will use the same code to do the streaming.

The range_streamer is now extended to support both fetch from and push
to peer node. Another big change is now the range_streamer will stream
less ranges at a time, so less data, per stream_plan and range_streamer
will remember which ranges are failed to stream and can retry later.

The retry policy is very simple at the moment it retries at most 5 times
and sleep 1 minutes, 1.5^2 minutes, 1.5^3 minutes ....

Later, we can introduce api for user to decide when to stop retrying and
the retry interval.

The benefits:

- All the cluster operation shares the same code to stream

- We can know the operation progress, e.g., we can know total number of
  ranges need to be streamed and number of ranges finished in
  bootstrap, decommission and etc.
- All the cluster operation can survive peer node down during the
  operation which usually takes long time to complete, e.g., when adding
  a new node, currently if any of the existing node which streams data to
  the new node had issue sending data to the new node, the whole bootstrap
  process will fail. After this patch, we can fix the problematic node
  and restart it, the joining node will retry streaming from the node
  again.
- We can fail streaming early and timeout early and retry less because
  all the operations use stream can survive failure of a single
  stream_plan. It is not that important for now to have to make a single
  stream_plan successful. Note, another user of streaming, repair, is now
  using small stream_plan as well and can rerun the repair for the
  failed ranges too.

This is one step closer to supporting the resumable add/remove node
opeartions.
2017-08-07 16:31:47 +08:00
Avi Kivity
86de6cc7fb Merge seastat upstream
* seastar f14d2a3...7a49ae5 (8):
  > sharded: improve support for cooperating sharded<> services
  > sharded: support for peer services
  > semaphore: add a version of with_semaphore that takes a duration timeout
  > scripts: perftune.py: fix the CPU mask generation for more than 64 CPUs
  > Revert "future-utils: make when_all() (vector variant) exception safe"
  > Revert "future-utils: fix gross compilation errors in when_all()"
  > future-utils: fix gross compilation errors in when_all()
  > future-utils: make when_all() (vector variant) exception safe

Includes change to batchlog_manager constructor to adapt it to
seastar::sharded::start() change.
2017-08-06 17:47:47 +03:00
Avi Kivity
3edec66903 Revert "repair: Make send_repair_checksum_range timeout"
This reverts commit 98757069a5. We have the
failure detector which will detect an unresponsive node and fail the RPC.
Adding a timeout can just introduce false positives.
2017-08-06 13:09:36 +03:00
Avi Kivity
621926d914 dist: debian: escape "$" character for make 2017-08-05 16:51:03 +03:00
Avi Kivity
a471851bf1 dist: debian: add /opt/scylladb/bin to PATH so antlr can be found 2017-08-05 15:46:58 +03:00
Avi Kivity
8bdc0dd471 dist: debian: search for libaries in /opt/scylladb/lib 2017-08-05 13:18:14 +03:00
Takuya ASADA
2ff3bdba5c dist/debian: switch Ubuntu 3rdparty packages to external build service
Switch Ubuntu to launchpad ppa:
https://launchpad.net/~scylladb/+archive/ubuntu/ppa/+packages

Since switching 3rdparty on Debian is not ready yet, keep them to use scylla
3rdparty repo, also keep --rebuild-dep option and dist/debian/dep.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1501866678-4922-1-git-send-email-syuu@scylladb.com>
2017-08-05 11:29:13 +03:00
Glauber Costa
4a911879a3 add active streaming reads metric
In commit f38e4ff3f, we have separated streaming reads from normal reads
for the purpose of determining the maximum number of reads going on.
However, we'll now be totally unaware of how many reads will be
happening on behalf of streaming and that can be important information
when debugging issues.

This patch adds this metric so we don't fly blind.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <1501909973-32519-1-git-send-email-glauber@scylladb.com>
2017-08-05 11:06:37 +03:00
Duarte Nunes
587b6be089 dirty_memory_manager: Add missing include
Allows tests/memory_footprint to build on Ubuntu 14.04.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-08-04 10:15:23 +02:00
Avi Kivity
4f12068e50 dist: re-add --rebuild-dep to build_rpm.sh
For compatibility with existing scripts; ignored.
2017-08-04 07:10:18 +03:00
Takuya ASADA
b5e83ebd94 dist/redhat: switch 3rdparty packages to external build service
Drop existing 3rdparty build script/3rdparty repo, switch to Fedora Copr
https://copr.fedorainfracloud.org/coprs/scylladb/scylla-3rdparty/packages/

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20170803110754.22152-1-syuu@scylladb.com>
2017-08-04 06:40:09 +03:00
Pekka Enberg
90872ffa1f docker: Disable stall detector
Fixes #2162

Message-Id: <1501759957-4380-1-git-send-email-penberg@scylladb.com>
2017-08-03 14:52:49 +03:00
Takuya ASADA
91ade1a660 dist/debian: check scylla user/group existance before adding them
To prevent install failing on the environment which already has scylla
user/group, existance check is needed.

Fixes #2389

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1495023805-14905-1-git-send-email-syuu@scylladb.com>
2017-08-03 13:01:18 +03:00
Takuya ASADA
6ac254fbcb dist: change nomerges=1 on block devices during fstrim execution
We have problem to run fstrim with nomerges=2, so we need to change
the parameter to 1 during fstrim execution.
To do this, this fix changes follow things:
 - revert dropping scylla_fstrim on Ubuntu 16.04/CentOS
 - disable distribution provided fstrim script
 - enable scylla_fstrim on all distributions
 - introduce --set-nomerges on scylla-blocktune
 - scylla_fstrim call scylla-blocktune by following order:
   - 'scylla-blocktune --set-nomerges 1'
   - 'fstrim' for each devices
   - 'scylla-blocktune --set-nomerges 2'

Fixes #2649

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1501531393-21109-1-git-send-email-syuu@scylladb.com>
2017-08-03 13:00:34 +03:00
Botond Dénes
0b7ac01f0f Add QtCreator project file and .gdbinit to .gitignore
Message-Id: <ff662910fe1156cdde2bda4aa5bb9cfc45bddda9.1501752340.git.bdenes@scylladb.com>
2017-08-03 12:58:35 +03:00
Avi Kivity
f38e4ff3f9 database: prevent streaming reads from blocking normal reads
Streaming reads and normal reads share a semaphore, so if a bunch of
streaming reads use all available slots, no normal reads can proceed.

Fix by assigning streaming reads their own semaphore; they will compete
with normal reads once issued, and the I/O scheduler will determine the
winner.

Fixes #2663.
Message-Id: <20170802153107.939-1-avi@scylladb.com>
2017-08-03 10:23:01 +01:00
Avi Kivity
911536960a database: remove streaming read queue length limit
If we fail a streaming read due queue overload, we will fail the entire repair.
Remove the limit for streaming, and trust the caller (repair) to have bounded
concurrency.

Fixes #2659.
Message-Id: <20170802143448.28311-1-avi@scylladb.com>
2017-08-03 10:21:07 +01:00
Avi Kivity
e9519ca8e5 Merge "make range selects more efficient by going through digest matching stage" from Gleb
"Currently scanning reads go to reconciliation stage directly which
requires asking for mutation data from all peers. This series makes
it to try matching digests first like a single partition read."

Fixes #2666.

* 'gleb/digest_scan' of github.com:cloudius-systems/seastar-dev:
  storage_proxy: make range_slice_read_executor go through digest matching state
  storage_proxy: add capability to read data/digest for non singular ranges
  storage_proxy: remove redundant parameter from never_speculating_read_executor constructor
2017-08-03 12:18:11 +03:00
Tzach Livyatan
d3d46a5eac Add comments on cluster_name in scylla.yaml
Fix #2316

Signed-off-by: Tzach Livyatan <tzach@scylladb.com>
Message-Id: <20170730082922.21884-1-tzach@scylladb.com>
2017-08-03 12:12:15 +03:00
Gleb Natapov
d2a2a6d471 storage_proxy: make range_slice_read_executor go through digest matching state
Currently scanning reads go to reconciliation stage directly which
requires asking for mutation data from all peers. This patch makes
it to try matching digests first like a single partition read.

The change requires internode protocol changes since currently it is not
possible to ask for multi partition data/digest over RPC. It means that
the capability has to be guarded by new gossip feature flag which the
patch also adds.
2017-08-03 11:37:03 +03:00
Tzach Livyatan
99b2232c5d docs/docker: Add hostname parameter to examples
Using --hostname to give the container a meaningful name is a good
practice, and make the monitoring dashboard easier to understand

Signed-off-by: Tzach Livyatan <tzach@scylladb.com>
Message-Id: <20170803081027.6675-1-tzach@scylladb.com>
2017-08-03 11:14:12 +03:00
Gleb Natapov
3b7d8c8767 storage_proxy: add capability to read data/digest for non singular ranges
Currently only mutation_data read supports non singular ranges. This
patch extends data/digest reads to support them too.
2017-08-03 10:35:09 +03:00
Gleb Natapov
c619ef258b storage_proxy: remove redundant parameter from never_speculating_read_executor constructor
never_speculating_read_executor always waits for all targets so
block_for parameter is always equal to targets.size(). No need to
to pass it explicitly.
2017-08-03 10:08:44 +03:00
Duarte Nunes
4c9206ba2f tests/sstable_mutation_test: Don't use moved-from object
Fix a bug introduced in dbbb9e93d and exposed by gcc6 by not using a
moved-from object. Twice.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170802161033.4213-1-duarte@scylladb.com>
2017-08-03 09:45:49 +03:00
Asias He
763fa83232 repair: Fix build in repair_cf_range
The compiler does not like the mutable.
Message-Id: <83c5e8a944b72a095b8e29e9988986e6ca9cefc5.1501690749.git.asias@scylladb.com>
2017-08-02 18:57:32 +02:00
Asias He
5798625d73 repair: Singal parallelism_semaphore in case of error
If we throw after we take the semaphore and beforew the when_all
below runs, no one will increase the semaphore.

Fixes #2661
Message-Id: <49540ede4c8a6d84004e10e0f63690e3c21d72c7.1501686383.git.asias@scylladb.com>
2017-08-02 18:32:32 +03:00
Avi Kivity
ebff739a84 Merge "use paging for compaction history" from Amnon
"This series adds an option to use paging in internal query and use that for the
get compaction history function.

Internal paging will be done explicitly, to use paging, you first create a
state object (that contains the query as well) and use that state to get the
first page, the result will contain both the query result and a new state that
can be used to get the next page.

Fixes #2366"

* 'amnon/paged_compaction_history_v5' of github.com:cloudius-systems/seastar-dev:
  system_keyspace: Use paging for get compaction history
  Add paging for internal queries
  query_options: Allows creating query_options from query_options
2017-08-02 18:15:58 +03:00
Avi Kivity
ac31abf6a4 repair: don't lambda-capture repair_tracker
It is static, so it need not be captured, and some compilers complain.
2017-08-02 18:07:31 +03:00
Avi Kivity
ce60ef59f3 Revert "repair: Singal parallelism_semaphore in case of error"
This reverts commit a548eee28c. It releases
the semaphore too early (noted by Glauber).
2017-08-02 17:13:46 +03:00
Avi Kivity
b2753b0183 Merge "Fix possible repair stuck" from Asias
"This series tries to fix possible repair stuck."

Fixes #2660, #2661, #2662.

* tag 'asias/repair_stuck_v2.1' of github.com:cloudius-systems/seastar-dev:
  repair: Make send_repair_checksum_range timeout
  repair: Singal parallelism_semaphore in case of error
  repair: Fix repair_tracker done
2017-08-02 16:51:51 +03:00
Asias He
98757069a5 repair: Make send_repair_checksum_range timeout
If the verb never returns the repair will hangs forever. Make it use the
timeout version of the send_message.

Fixes #2662
2017-08-02 21:41:50 +08:00
Asias He
a548eee28c repair: Singal parallelism_semaphore in case of error
If we throw after we take the semaphore and beforew the when_all
below runs, one one will increase the semaphore.

Fixes #2661
2017-08-02 21:41:45 +08:00
Asias He
abcff4c78e repair: Fix repair_tracker done
If it throws after repair_tracker.start and before the when_all below,
the repair_tracker.done will never be called for this repair id.

Fixes #2660
2017-08-02 21:40:29 +08:00
Pekka Enberg
78f68613ce dist/docker: Reduce number of layers
One of the best practices for Dockerfiles is to minimize the number of
layers because they increase the overall image size:

https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#minimize-the-number-of-layers

Consolidate our "yum install" commands to reduce the number of lauyers.

Suggested by Dean Hamstead.

Message-Id: <1501670572-8701-1-git-send-email-penberg@scylladb.com>
2017-08-02 15:21:05 +03:00
Takuya ASADA
ffbdacc1fa dist/debian: remove ant from prerequisite packages
This lines are mistakenly copied from scylla-tools, won't need for scylla-server.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1498029619-1928-1-git-send-email-syuu@scylladb.com>
2017-08-02 12:12:42 +03:00
Duarte Nunes
cec41f9de6 Merge seastar upstream
* seastar fc937b8...f14d2a3 (4):
  > configure.py: Ensure tmp directory exists when getting dpdk cflags
  > checked_ptr: fix hash() compilation
  > net: fix potential use after free in posix_server_socket::accept()
  > http: removed unneeded lamda captures

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-08-02 10:05:08 +02:00
Asias He
cf6f4a5185 gossip: Introduce the shadow_round_ms option
It specifies the maximum gossip shadow round time. It can be used to
reduce the gossip feature check time during node boot up.
For instance, when the first node in the cluster, which listed both
itself and other node as seed in the yaml config, boots up, it will try
to talk to other seed nodes which are not started yet. The gossip shadow
round will be used to fetch the feature info of the cluster. Since there
is no other seed node in the cluster, the shadow round will fail. User
can reduce the default shadow_round_ms option to reduce the boot time.

Fixes #2615
Message-Id: <10916ce9059f3c7f1a1fb465919ae57de3b67d59.1500540297.git.asias@scylladb.com>
2017-08-02 09:52:35 +03:00
Vlad Zolotarov
4b28ea216d utils::loading_cache: cancel the timer after closing the gate
The timer is armed inside the section guarded by the _timer_reads_gate
therefore it has to be canceled after the gate is closed.

Otherwise we may end up with the armed timer after stop() method has
returned a ready future.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1501603059-32515-1-git-send-email-vladz@scylladb.com>
2017-08-01 17:21:44 +01:00
Duarte Nunes
569bbf2edd sstables/sstables: Use per-cpu noop_write_monitor
We employ a thread-per-core architecture, so don't go about sharing
seastar::shared_ptrs across cpus.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170801144153.17354-1-duarte@scylladb.com>
2017-08-01 18:10:49 +03:00
Avi Kivity
db7329b1cb Merge "Ensure correct EOC for PI block cell names" from Duarte
"This series ensures the always write correct cell names to promoted
index cell blocks, taking into account the eoc of range tombstones.

Fixes #2333"

* 'pi-cell-name/v1' of github.com:duarten/scylla:
  tests/sstable_mutation_test: Test promoted index blocks are monotonic
  sstables: Consider eoc when flushing pi block
  sstables: Extract out converting bound_kind to eoc
2017-08-01 18:09:07 +03:00
Gleb Natapov
1da4d5c5ee cql transport: run accept loop in the foreground
It was meant to be run in the foreground since it is waited upon during
stop(), but as it is now from the stop() perspective it is completed
after first connection is accepted.

Fixes #2652

Message-Id: <20170801125558.GS20001@scylladb.com>
2017-08-01 17:04:14 +03:00
Avi Kivity
1e8bb972b6 compaction: fix iteration in leveled compaction droppable tombstones loop
Since get_level_count() is unsigned, it will never be negative, and
the loop may never terminate.

Message-Id: <20170719133502.13316-1-avi@scylladb.com>
2017-08-01 13:40:36 +03:00
Avi Kivity
ba2e170e4b compaction: fix return in leveled compaction droppable tombstones loop
If the loop ever terminates, we need to return something.

Message-Id: <20170719133508.13374-1-avi@scylladb.com>
2017-08-01 13:33:02 +03:00
Takuya ASADA
a998b7b3eb dist/ami: follow scylla-tools package name change on RedHat variants
Since scylla-tools generates two .rpm packages, we need to copy them to our AMI.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20170722090002.9850-1-syuu@scylladb.com>
2017-07-31 18:57:12 +03:00
Avi Kivity
7c8dea088a Merge seastar upstream
* seastar 54e940f...fc937b8 (2):
  > configure.py: Always ensure tmp directory exists
  > coding-style.md: introduce
2017-07-31 18:06:09 +03:00
Duarte Nunes
a85232dd82 Fix compilation errors on GCC 6
GCC 6 inconsistently requires explicitly calling a member function
through "this->" for lambda functions capturing "this".

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170731143755.21970-1-duarte@scylladb.com>
2017-07-31 17:40:44 +03:00