Commit Graph

12831 Commits

Author SHA1 Message Date
Avi Kivity
3edec66903 Revert "repair: Make send_repair_checksum_range timeout"
This reverts commit 98757069a5. We have the
failure detector which will detect an unresponsive node and fail the RPC.
Adding a timeout can just introduce false positives.
2017-08-06 13:09:36 +03:00
Avi Kivity
621926d914 dist: debian: escape "$" character for make 2017-08-05 16:51:03 +03:00
Avi Kivity
a471851bf1 dist: debian: add /opt/scylladb/bin to PATH so antlr can be found 2017-08-05 15:46:58 +03:00
Avi Kivity
8bdc0dd471 dist: debian: search for libaries in /opt/scylladb/lib 2017-08-05 13:18:14 +03:00
Takuya ASADA
2ff3bdba5c dist/debian: switch Ubuntu 3rdparty packages to external build service
Switch Ubuntu to launchpad ppa:
https://launchpad.net/~scylladb/+archive/ubuntu/ppa/+packages

Since switching 3rdparty on Debian is not ready yet, keep them to use scylla
3rdparty repo, also keep --rebuild-dep option and dist/debian/dep.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1501866678-4922-1-git-send-email-syuu@scylladb.com>
2017-08-05 11:29:13 +03:00
Glauber Costa
4a911879a3 add active streaming reads metric
In commit f38e4ff3f, we have separated streaming reads from normal reads
for the purpose of determining the maximum number of reads going on.
However, we'll now be totally unaware of how many reads will be
happening on behalf of streaming and that can be important information
when debugging issues.

This patch adds this metric so we don't fly blind.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <1501909973-32519-1-git-send-email-glauber@scylladb.com>
2017-08-05 11:06:37 +03:00
Duarte Nunes
587b6be089 dirty_memory_manager: Add missing include
Allows tests/memory_footprint to build on Ubuntu 14.04.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-08-04 10:15:23 +02:00
Avi Kivity
4f12068e50 dist: re-add --rebuild-dep to build_rpm.sh
For compatibility with existing scripts; ignored.
2017-08-04 07:10:18 +03:00
Takuya ASADA
b5e83ebd94 dist/redhat: switch 3rdparty packages to external build service
Drop existing 3rdparty build script/3rdparty repo, switch to Fedora Copr
https://copr.fedorainfracloud.org/coprs/scylladb/scylla-3rdparty/packages/

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20170803110754.22152-1-syuu@scylladb.com>
2017-08-04 06:40:09 +03:00
Pekka Enberg
90872ffa1f docker: Disable stall detector
Fixes #2162

Message-Id: <1501759957-4380-1-git-send-email-penberg@scylladb.com>
2017-08-03 14:52:49 +03:00
Takuya ASADA
91ade1a660 dist/debian: check scylla user/group existance before adding them
To prevent install failing on the environment which already has scylla
user/group, existance check is needed.

Fixes #2389

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1495023805-14905-1-git-send-email-syuu@scylladb.com>
2017-08-03 13:01:18 +03:00
Takuya ASADA
6ac254fbcb dist: change nomerges=1 on block devices during fstrim execution
We have problem to run fstrim with nomerges=2, so we need to change
the parameter to 1 during fstrim execution.
To do this, this fix changes follow things:
 - revert dropping scylla_fstrim on Ubuntu 16.04/CentOS
 - disable distribution provided fstrim script
 - enable scylla_fstrim on all distributions
 - introduce --set-nomerges on scylla-blocktune
 - scylla_fstrim call scylla-blocktune by following order:
   - 'scylla-blocktune --set-nomerges 1'
   - 'fstrim' for each devices
   - 'scylla-blocktune --set-nomerges 2'

Fixes #2649

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1501531393-21109-1-git-send-email-syuu@scylladb.com>
2017-08-03 13:00:34 +03:00
Botond Dénes
0b7ac01f0f Add QtCreator project file and .gdbinit to .gitignore
Message-Id: <ff662910fe1156cdde2bda4aa5bb9cfc45bddda9.1501752340.git.bdenes@scylladb.com>
2017-08-03 12:58:35 +03:00
Avi Kivity
f38e4ff3f9 database: prevent streaming reads from blocking normal reads
Streaming reads and normal reads share a semaphore, so if a bunch of
streaming reads use all available slots, no normal reads can proceed.

Fix by assigning streaming reads their own semaphore; they will compete
with normal reads once issued, and the I/O scheduler will determine the
winner.

Fixes #2663.
Message-Id: <20170802153107.939-1-avi@scylladb.com>
2017-08-03 10:23:01 +01:00
Avi Kivity
911536960a database: remove streaming read queue length limit
If we fail a streaming read due queue overload, we will fail the entire repair.
Remove the limit for streaming, and trust the caller (repair) to have bounded
concurrency.

Fixes #2659.
Message-Id: <20170802143448.28311-1-avi@scylladb.com>
2017-08-03 10:21:07 +01:00
Avi Kivity
e9519ca8e5 Merge "make range selects more efficient by going through digest matching stage" from Gleb
"Currently scanning reads go to reconciliation stage directly which
requires asking for mutation data from all peers. This series makes
it to try matching digests first like a single partition read."

Fixes #2666.

* 'gleb/digest_scan' of github.com:cloudius-systems/seastar-dev:
  storage_proxy: make range_slice_read_executor go through digest matching state
  storage_proxy: add capability to read data/digest for non singular ranges
  storage_proxy: remove redundant parameter from never_speculating_read_executor constructor
2017-08-03 12:18:11 +03:00
Tzach Livyatan
d3d46a5eac Add comments on cluster_name in scylla.yaml
Fix #2316

Signed-off-by: Tzach Livyatan <tzach@scylladb.com>
Message-Id: <20170730082922.21884-1-tzach@scylladb.com>
2017-08-03 12:12:15 +03:00
Gleb Natapov
d2a2a6d471 storage_proxy: make range_slice_read_executor go through digest matching state
Currently scanning reads go to reconciliation stage directly which
requires asking for mutation data from all peers. This patch makes
it to try matching digests first like a single partition read.

The change requires internode protocol changes since currently it is not
possible to ask for multi partition data/digest over RPC. It means that
the capability has to be guarded by new gossip feature flag which the
patch also adds.
2017-08-03 11:37:03 +03:00
Tzach Livyatan
99b2232c5d docs/docker: Add hostname parameter to examples
Using --hostname to give the container a meaningful name is a good
practice, and make the monitoring dashboard easier to understand

Signed-off-by: Tzach Livyatan <tzach@scylladb.com>
Message-Id: <20170803081027.6675-1-tzach@scylladb.com>
2017-08-03 11:14:12 +03:00
Gleb Natapov
3b7d8c8767 storage_proxy: add capability to read data/digest for non singular ranges
Currently only mutation_data read supports non singular ranges. This
patch extends data/digest reads to support them too.
2017-08-03 10:35:09 +03:00
Gleb Natapov
c619ef258b storage_proxy: remove redundant parameter from never_speculating_read_executor constructor
never_speculating_read_executor always waits for all targets so
block_for parameter is always equal to targets.size(). No need to
to pass it explicitly.
2017-08-03 10:08:44 +03:00
Duarte Nunes
4c9206ba2f tests/sstable_mutation_test: Don't use moved-from object
Fix a bug introduced in dbbb9e93d and exposed by gcc6 by not using a
moved-from object. Twice.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170802161033.4213-1-duarte@scylladb.com>
2017-08-03 09:45:49 +03:00
Asias He
763fa83232 repair: Fix build in repair_cf_range
The compiler does not like the mutable.
Message-Id: <83c5e8a944b72a095b8e29e9988986e6ca9cefc5.1501690749.git.asias@scylladb.com>
2017-08-02 18:57:32 +02:00
Asias He
5798625d73 repair: Singal parallelism_semaphore in case of error
If we throw after we take the semaphore and beforew the when_all
below runs, no one will increase the semaphore.

Fixes #2661
Message-Id: <49540ede4c8a6d84004e10e0f63690e3c21d72c7.1501686383.git.asias@scylladb.com>
2017-08-02 18:32:32 +03:00
Avi Kivity
ebff739a84 Merge "use paging for compaction history" from Amnon
"This series adds an option to use paging in internal query and use that for the
get compaction history function.

Internal paging will be done explicitly, to use paging, you first create a
state object (that contains the query as well) and use that state to get the
first page, the result will contain both the query result and a new state that
can be used to get the next page.

Fixes #2366"

* 'amnon/paged_compaction_history_v5' of github.com:cloudius-systems/seastar-dev:
  system_keyspace: Use paging for get compaction history
  Add paging for internal queries
  query_options: Allows creating query_options from query_options
2017-08-02 18:15:58 +03:00
Avi Kivity
ac31abf6a4 repair: don't lambda-capture repair_tracker
It is static, so it need not be captured, and some compilers complain.
2017-08-02 18:07:31 +03:00
Avi Kivity
ce60ef59f3 Revert "repair: Singal parallelism_semaphore in case of error"
This reverts commit a548eee28c. It releases
the semaphore too early (noted by Glauber).
2017-08-02 17:13:46 +03:00
Avi Kivity
b2753b0183 Merge "Fix possible repair stuck" from Asias
"This series tries to fix possible repair stuck."

Fixes #2660, #2661, #2662.

* tag 'asias/repair_stuck_v2.1' of github.com:cloudius-systems/seastar-dev:
  repair: Make send_repair_checksum_range timeout
  repair: Singal parallelism_semaphore in case of error
  repair: Fix repair_tracker done
2017-08-02 16:51:51 +03:00
Asias He
98757069a5 repair: Make send_repair_checksum_range timeout
If the verb never returns the repair will hangs forever. Make it use the
timeout version of the send_message.

Fixes #2662
2017-08-02 21:41:50 +08:00
Asias He
a548eee28c repair: Singal parallelism_semaphore in case of error
If we throw after we take the semaphore and beforew the when_all
below runs, one one will increase the semaphore.

Fixes #2661
2017-08-02 21:41:45 +08:00
Asias He
abcff4c78e repair: Fix repair_tracker done
If it throws after repair_tracker.start and before the when_all below,
the repair_tracker.done will never be called for this repair id.

Fixes #2660
2017-08-02 21:40:29 +08:00
Pekka Enberg
78f68613ce dist/docker: Reduce number of layers
One of the best practices for Dockerfiles is to minimize the number of
layers because they increase the overall image size:

https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#minimize-the-number-of-layers

Consolidate our "yum install" commands to reduce the number of lauyers.

Suggested by Dean Hamstead.

Message-Id: <1501670572-8701-1-git-send-email-penberg@scylladb.com>
2017-08-02 15:21:05 +03:00
Takuya ASADA
ffbdacc1fa dist/debian: remove ant from prerequisite packages
This lines are mistakenly copied from scylla-tools, won't need for scylla-server.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1498029619-1928-1-git-send-email-syuu@scylladb.com>
2017-08-02 12:12:42 +03:00
Duarte Nunes
cec41f9de6 Merge seastar upstream
* seastar fc937b8...f14d2a3 (4):
  > configure.py: Ensure tmp directory exists when getting dpdk cflags
  > checked_ptr: fix hash() compilation
  > net: fix potential use after free in posix_server_socket::accept()
  > http: removed unneeded lamda captures

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-08-02 10:05:08 +02:00
Asias He
cf6f4a5185 gossip: Introduce the shadow_round_ms option
It specifies the maximum gossip shadow round time. It can be used to
reduce the gossip feature check time during node boot up.
For instance, when the first node in the cluster, which listed both
itself and other node as seed in the yaml config, boots up, it will try
to talk to other seed nodes which are not started yet. The gossip shadow
round will be used to fetch the feature info of the cluster. Since there
is no other seed node in the cluster, the shadow round will fail. User
can reduce the default shadow_round_ms option to reduce the boot time.

Fixes #2615
Message-Id: <10916ce9059f3c7f1a1fb465919ae57de3b67d59.1500540297.git.asias@scylladb.com>
2017-08-02 09:52:35 +03:00
Vlad Zolotarov
4b28ea216d utils::loading_cache: cancel the timer after closing the gate
The timer is armed inside the section guarded by the _timer_reads_gate
therefore it has to be canceled after the gate is closed.

Otherwise we may end up with the armed timer after stop() method has
returned a ready future.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1501603059-32515-1-git-send-email-vladz@scylladb.com>
2017-08-01 17:21:44 +01:00
Duarte Nunes
569bbf2edd sstables/sstables: Use per-cpu noop_write_monitor
We employ a thread-per-core architecture, so don't go about sharing
seastar::shared_ptrs across cpus.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170801144153.17354-1-duarte@scylladb.com>
2017-08-01 18:10:49 +03:00
Avi Kivity
db7329b1cb Merge "Ensure correct EOC for PI block cell names" from Duarte
"This series ensures the always write correct cell names to promoted
index cell blocks, taking into account the eoc of range tombstones.

Fixes #2333"

* 'pi-cell-name/v1' of github.com:duarten/scylla:
  tests/sstable_mutation_test: Test promoted index blocks are monotonic
  sstables: Consider eoc when flushing pi block
  sstables: Extract out converting bound_kind to eoc
2017-08-01 18:09:07 +03:00
Gleb Natapov
1da4d5c5ee cql transport: run accept loop in the foreground
It was meant to be run in the foreground since it is waited upon during
stop(), but as it is now from the stop() perspective it is completed
after first connection is accepted.

Fixes #2652

Message-Id: <20170801125558.GS20001@scylladb.com>
2017-08-01 17:04:14 +03:00
Avi Kivity
1e8bb972b6 compaction: fix iteration in leveled compaction droppable tombstones loop
Since get_level_count() is unsigned, it will never be negative, and
the loop may never terminate.

Message-Id: <20170719133502.13316-1-avi@scylladb.com>
2017-08-01 13:40:36 +03:00
Avi Kivity
ba2e170e4b compaction: fix return in leveled compaction droppable tombstones loop
If the loop ever terminates, we need to return something.

Message-Id: <20170719133508.13374-1-avi@scylladb.com>
2017-08-01 13:33:02 +03:00
Takuya ASADA
a998b7b3eb dist/ami: follow scylla-tools package name change on RedHat variants
Since scylla-tools generates two .rpm packages, we need to copy them to our AMI.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20170722090002.9850-1-syuu@scylladb.com>
2017-07-31 18:57:12 +03:00
Avi Kivity
7c8dea088a Merge seastar upstream
* seastar 54e940f...fc937b8 (2):
  > configure.py: Always ensure tmp directory exists
  > coding-style.md: introduce
2017-07-31 18:06:09 +03:00
Duarte Nunes
a85232dd82 Fix compilation errors on GCC 6
GCC 6 inconsistently requires explicitly calling a member function
through "this->" for lambda functions capturing "this".

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170731143755.21970-1-duarte@scylladb.com>
2017-07-31 17:40:44 +03:00
Benoît Canet
b44ba11e4c transport: Count the number of unpaged queries
Queries with query page size equal or smaller than
zero are unpaged queries.

Count these kind of queries and make them a metrics
since they can ruin the performance of the system.

Message-Id: <20170731130004.25807-2-benoit@scylladb.com>
2017-07-31 16:01:45 +03:00
Avi Kivity
3fe6731436 Merge "educe the effect of the latency metrics" from Amnon
"This series reduce that effect in two ways:
1. Remove the latency counters from the system keyspaces
2. Reduce the histogram size by limiting the maximum number of buckets and
   stop the last bucket."

Fixes #2650.

* 'amnon/remove_cf_latency_v2' of github.com:cloudius-systems/seastar-dev:
  database: remove latency from the system table
  estimated histogram: return a smaller histogram
2017-07-31 15:58:30 +03:00
Paweł Dziepak
402799fcc0 mutation_reader: drop move_and_clear()
Since the discovery of std::exchange(x, {}) move_and_clear has become
obsolete. Beside, the name was wrong, it did not clear the vector but
recreated it meaning that any allocated memory wasn't reused (not that
it mattered in the existing usages).

Message-Id: <20170731123549.10887-1-pdziepak@scylladb.com>
2017-07-31 15:51:19 +03:00
Gleb Natapov
87bc3f7e7f configure.py: use user provided compiler flags when checking for features
User provided compiler flags my change an outcome of the test.

Message-Id: <20170724111520.GA18230@scylladb.com>
2017-07-31 15:33:06 +03:00
Avi Kivity
f4b2a1ef4e Merge "Optimise combined_mutation_reader" from Paweł
"These patches optimise combined_mutation_reader for cases where the majority
of mutation_readers is disjoint.

perf_fast_forward:
Results are medians of 3 of fragments/s as reported by perf_fast_forward.

Command:
perf_fast_forward -c1 --enable-cache

small: small-partition-skips (read=1, skip=0)
large: large-partition-skips (read=1, skip=0)

          before        after      diff
small     195753      238196       +22%
large    1244325     1359096        +9%

perf_simple_query:

Results are medians of 10 of reads/s as reported by perf_simple_query.

Command:
perf_simple_query -c1

before   98651.40
after   104554.85
diff          +6%"

* tag 'avoid-merge_mutations/v1' of https://github.com/pdziepak/scylla:
  combined_mutation_reader: avoid unnecessary merge_mutations()
  combined_mutation_reader: do not pop mutation with different key
2017-07-31 15:14:42 +03:00
Avi Kivity
178b54e790 Merge "memtable flush: Fixes and improvements" from Duarte
"This series ensure that when we retry a memtable flush, we re-acquire the
flush permit that was previously released. It also ensures we don't hold
the sstable read lock for the duration of the sleep leading to the retry.

To achieve that cleanly we refactor the way the permit lifecycle is managed
by employing a RAII-based approach.

We also improve the latency of writes blocked on virtual dirty by releasing
the flush permit before fsyncing the sstables. There are additional avenues
for performance improvements on top of this one."

* 'memtable-flush-additional-fixes/v4' of github.com:duarten/scylla:
  column_family: Re-acquire flush permit in case of error
  column_family: Don't hold sstable read lock when retrying flush
  sstables: Release the flush permit before fsyncing
  sstables: Introduce write_monitor
  database: Extract out dirty_memory_manager
  dirty_memory_manager: Refactor flush permit lifetime management
  dirty_memory_manager: Invert permit acquisition order
  memtable_list: Register different seal functions for each behaviour
2017-07-31 14:57:19 +03:00