Commit Graph

10416 Commits

Author SHA1 Message Date
Asias He
99e77e8ec2 repair: Do not abort the repair when one range is failed
failed_ranges is added to track the ranges that fail during repair.
2016-09-26 06:28:51 +08:00
Asias He
81c98ff3d9 repair: Reduce stream_plan usage
Right now, we are using one stream_plan for each range of a column
family. This generates tons of stream_plans and stream_sessions. Each
stream_plan can transfer multiple ranges and column families. We can
use a single stream_plan to stream datas for multiple ranges and column
families, so that 1) overhead of stream_plan/session negotiation is
reduced 2) it is much easier to debug/monitor few stream_sessions

Fixes #1685
2016-09-26 06:28:50 +08:00
Asias He
a0020fdad2 stream_session: Allow adding ranges to a cf more than once
Append the ranges to a stream_transfer_task if the cf is already added to
_transfers in add_transfer_ranges.
2016-09-26 06:28:50 +08:00
Asias He
576e15532f streaming: Add append_ranges for stream_transfer_task
Allow to append more ranges to transfer for a stream transfer task.
2016-09-26 06:28:50 +08:00
Avi Kivity
3057ca05bc Merge "Improve loggging when nodes are decommissioned" from Asias
"When a node is decommissioned, its gossip state will not be removed from gossip
immediately. It will only be removed 3 days later which helps nodes that were
down when the node was decommissioned to know decommission later when they are
up again.

This series improves the logging to reduce confusion when a node tries to
talking to a decommissioned node. In addition, we now do not try to talk to the
decommissioned in the unreachable_endpoints gossip round.

Fixes #1615"

* tag 'asias/loggging_decommissioned_nodes/v1' of github.com:cloudius-systems/seastar-dev:
  gossip: Make two log items debug level
  gossip: Print node status when node is UP or DOWN
  gossip: Ignore the node which is decommissioned in gossip round
  gossip: Print convict debug info only when the node is alive
  gossip: Add more timing log in add_expire_time_for_endpoint
  streaming: Print on_remove and on_restart log when peer exists
  streaming: Introduce has_peer in stream_manager
2016-09-25 15:19:13 +03:00
Asias He
830f4ee353 gossip: Make two log items debug level
It is duplciated with "InetAddresss x.x.x.x is now UP" message.

INFO  2016-09-23 10:35:15,512 [shard 0] gossip - Node 127.0.0.1 has restarted, now UP, status = NORMAL
INFO  2016-09-23 10:35:15,513 [shard 0] gossip - InetAddress 127.0.0.1 is now UP, status = NORMAL

Make the log a bit cleaner.
2016-09-25 07:17:19 +08:00
Asias He
a26a26963c gossip: Print node status when node is UP or DOWN
For example:

gossip - InetAddress 127.0.0.4 is now UP, status = NORMAL
gossip - InetAddress 127.0.0.3 is now DOWN, status = LEFT
gossip - InetAddress 127.0.0.1 is now DOWN, status = shutdown
2016-09-25 07:17:19 +08:00
Asias He
1d9401d080 gossip: Ignore the node which is decommissioned in gossip round
If the node is decommissioned, there is no point to try to contact it
again in the gossip round.
2016-09-25 07:17:19 +08:00
Asias He
4b73443222 gossip: Print convict debug info only when the node is alive 2016-09-25 07:17:19 +08:00
Asias He
99a2ae0fb5 gossip: Add more timing log in add_expire_time_for_endpoint
It tells when the node is expected to expire and how many seconds are
left.
2016-09-25 07:17:19 +08:00
Asias He
40f7a355a0 streaming: Print on_remove and on_restart log when peer exists
We print the following messages even if there is no stream_session with
that peer. It is a bit confusing.

  INFO  2016-09-23 08:26:37,254 [shard 0] stream_session - stream_manager:
  Close all stream_session with peer = 127.0.0.1 in on_restart

  INFO  2016-09-23 08:26:37,287 [shard 0] stream_session - stream_manager:
  Close all stream_session with peer = 127.0.0.3 in on_remove

Print only when the streaming session with the peer exists.
2016-09-25 07:17:19 +08:00
Asias He
2ac4ce77a9 streaming: Introduce has_peer in stream_manager
It is used to query if a streaming peer with inet_address exists.
2016-09-25 07:17:13 +08:00
Nadav Har'El
fe1ba753ce Avoid semaphore's default initial value
The fact that Seastar's semaphore has a default initializer of 1 if not
explicitly initialized is confusing and unexpected and recently lead to
two bugs. So ScyllaDB should not rely on this default behavior, and specify
the initial value of each semaphore explicitly.

In several cases in the ScyllaDB code, the explict initialization was
missing, and this patch adds it. In one case (rate_limiter) I even think
the default of 1 was a bit strange, and 0 makes more sense.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1474530745-23951-1-git-send-email-nyh@scylladb.com>
2016-09-24 19:25:02 +03:00
Paweł Dziepak
eb59b4c4ab keys: disable constructing from generic range
stdx::optional<T> uses quite elaborate std::enable_if_t magic to decide
whether the argument passed to its constructor should be used for a call
T constructor or stdx::optional<T> constructor.

Apparently, with GCC 6.2 having T constructor which accepts any type
confuses that magic and we end up with compile errors.

The solution is to have from_range() method that replaces that
constructor from range. There is also constructor that creates a key
from std::vector<bytes> so that code generated by IDL works as it did
before.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1474550971-15309-1-git-send-email-pdziepak@scylladb.com>
2016-09-24 18:57:01 +03:00
Raphael S. Carvalho
cfe7419f0f sstables: update or remove some outdated comments
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <74bae503447da2544a005e29b7d3aafa9f6e8c90.1474383273.git.raphaelsc@scylladb.com>
2016-09-24 18:53:19 +03:00
Raphael S. Carvalho
0f1bd3c527 db: fix clustering key filter
When date tiered strategy is enabled, filter_sstable_for_reader()
was returning more sstables than needed because the return type
of serialized_tri_compare::operator() was wrong, which results
in bad performance.

tgrabiec: Refs #1449

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0301d7588e33c7bbb8cd80fed20a1827926a8fff.1474585088.git.raphaelsc@scylladb.com>
2016-09-23 12:33:58 +02:00
Asias He
e352570f52 conf: Move initial_token to supported section in scylla.yaml
initial_token is actually supported

Fixes #1686
Message-Id: <465da088696f72a3a7bcf19ba8e4895a0a648e7c.1474512235.git.asias@scylladb.com>
2016-09-23 09:34:05 +03:00
Tomasz Grabiec
0b0d126721 Merge seastar upstream
Fixes #1622.
Fixes #1690.

* seastar 40a68fa...2b55789 (5):
  > input_stream: Fix possible infinite recursion in consume()
  > iostream: Fix stack overflow in output_stream::split_and_put()
  > condition_variable: fix spurious wakeup
  > Merge "assorted rpc fixes" from Gleb
  > Merge "Simple fixes for doxygen" from Glauber
2016-09-22 14:27:52 +02:00
Paweł Dziepak
906250dcbd Merge "Enhance GDB script with new LSA-related commands" from Tomek 2016-09-21 13:22:00 +01:00
Raphael S. Carvalho
67343798cf api: implement api to return sstable count per level
'nodetool cfstats' wasn't showing per-level sstable count because
the API wasn't implemented.

Fixes #1119.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0dcdf9196eaec1692003fcc8ef18c77d0834b2c6.1474410770.git.raphaelsc@scylladb.com>
2016-09-21 09:13:40 +03:00
Asias He
aa47265381 gossip: Fix std::out_of_range in setup_collectd
It is possible that endpoint_state_map does not contain the entry for
the node itself when collectd accesses it.

Fixes the issue:

Sep 18 11:33:16 XXX scylla[19483]: [shard 0] seastar - Exceptional
future ignored: std::out_of_range (_Map_base::at)

Fixes #1656

Message-Id: <8ffe22a542ff71e8c121b06ad62f94db54cc388f.1474377722.git.asias@scylladb.com>
2016-09-20 19:38:16 +03:00
Tomasz Grabiec
69aec3835f scylla-gdb: Enhance 'scylla ptr' to show if object is managed by LSA
Example:

  (gdb) scylla ptr 0x601000480000
  thread 1, large, LSA-managed

One can then use 'scylla lsa-segment 0x601000480000' to examine LSA
segment contents.
2016-09-20 16:53:23 +02:00
Tomasz Grabiec
486a92092b scylla-gdb: Add 'scylla segment-descs' command
Displays information about shard's segment descriptors. One can see
which segments belong to LSA, what's their occupancy, etc.

(gdb) scylla segment-descs
...
0x601000940000: lsa free=26092 region=0x60100036d890 zone=0x6010000fb420
0x601000980000: lsa free=26092 region=0x60100036d890 zone=0x6010000fb420
0x6010009c0000: lsa free=261153 region=0x60100036fcf0 zone=0x6010000fb420
0x601000a00000: std
0x601000a40000: lsa free=25508 region=0x60100036d890 zone=0x6010000fb420
0x601000a80000: std
0x601000ac0000: lsa free=26092 region=0x60100036d890 zone=0x6010000fb420
0x601000b00000: lsa free=26092 region=0x60100036d890 zone=0x6010000fb420
0x601000b40000: std
...
2016-09-20 16:53:23 +02:00
Tomasz Grabiec
b0b28696b5 scylla-gdb: Add 'scylla lsa-segment' command
Allows one to examine contents of LSA segment.

Example:

  (gdb) scylla lsa-segment 0x601000480000
  0x601000480e70: live size=144 migrator=standard_migrator<cache_entry>::object
  0x601000480f10: live size=144 migrator=standard_migrator<cache_entry>::object
  0x601000480fb0: free size=192
  0x60100048107e: free size=42
  0x6010004814e0: free size=192
  0x6010004815ae: free size=40
  0x6010004815e8: free size=192
  0x6010004816b8: live size=144 migrator=standard_migrator<cache_entry>::object
  0x601000481758: free size=192
  ...
2016-09-20 16:53:21 +02:00
Tomasz Grabiec
5011b77e15 scylla-gdb: Add std::vector wrapper
Makes vector values itearable from python level.
2016-09-20 16:53:20 +02:00
Pekka Enberg
42dd4670dc transport/server: Add CQL frame Snappy compression support
Fixes #1286
Message-Id: <1474370861-5928-1-git-send-email-penberg@scylladb.com>
2016-09-20 12:33:36 +01:00
Pekka Enberg
acc93509a2 transport/server: Fix CQL connection compression negotiation
Benoît Canet points out that CQL messages are not always compressed
although compression is enabled by the driver. Turns out our CQL
compression negotiation is broken. We need to negotiate compression upon
STARTUP message and not rely on the incoming request to have the
compression bit enabled.

Fixes #1680
Message-Id: <1474366693-3001-1-git-send-email-penberg@scylladb.com>
2016-09-20 11:19:27 +01:00
Pekka Enberg
f92bbc6f44 cql3: Kill unimplemented query_options constructor
The constructor was added in commit 7f3ce39 ("query_options: Add
constructor for batch mode options (multi-level)") but apparently it was
never actually implemented.

Spotted by CLion.
Message-Id: <1474303017-23383-1-git-send-email-penberg@scylladb.com>
2016-09-20 10:01:10 +01:00
Pekka Enberg
f1d0401ed2 main: Use proper logger for API server messages
We have a "startlog" that we can use to print out API server messages.

Message-Id: <1474358312-26510-1-git-send-email-penberg@scylladb.com>
2016-09-20 11:09:59 +03:00
Pekka Enberg
38b137713f transport/server: Fix CQL v1 prepared statement execution
The EXECUTE message encoding is different between CQL binary protocol
versions v1 and v2 (and later). Fix process_execute() to deserialize the
message as per the CQL binary protocol v1 specification:

    Executes a prepared query. The body of the message must be:
      <id><n><value_1>....<value_n><consistency>
    where:
      - <id> is the prepared query ID. It's the [short bytes] returned as a
        response to a PREPARE message.
      - <n> is a [short] indicating the number of following values.
      - <value_1>...<value_n> are the [bytes] to use for bound variables in the
        prepared query.
      - <consistency> is the [consistency] level for the operation.

Fixes #1676

Message-Id: <1474287392-16792-1-git-send-email-penberg@scylladb.com>
2016-09-19 15:26:30 +03:00
Raphael S. Carvalho
0eaa0f46c9 sstables: store first and last decorated keys in sstable object
leveled strategy uses heavily first and last decorated keys of a
sstable to get overlapping sstables in a given level. By storing
first and last decorated keys in sstable object, it's expected
that performance of leveled strategy (not compaction) will be
improved.
We will set first and last keys in sstable when either loading
or sealing it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0abca819454ab4c088541bb49714f1f6a7dc4f42.1473959677.git.raphaelsc@scylladb.com>
2016-09-19 13:25:58 +02:00
Raphael S. Carvalho
dffb41f9d8 sstables: remove schema parameter from some sstable methods
schema can now be found in the sstable object itself.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0fa44fedbe784d924522d7eeca77c16294479c6e.1473959677.git.raphaelsc@scylladb.com>
2016-09-19 13:25:58 +02:00
Tomasz Grabiec
2282599394 tests: Add test for UUID type ordering
Message-Id: <1473956716-5209-2-git-send-email-tgrabiec@scylladb.com>
2016-09-16 11:07:14 +01:00
Tomasz Grabiec
804fe50b7f types: fix uuid_type_impl::less
timeuuid_type_impl::compare_bytes is a "trichotomic" comparator (-1,
0, 1) while less() is a "less" comparator (false, true). The code
incorrectly returns c1 instead of c1 < 0 which breaks the ordering.

Fixes #1196.
Message-Id: <1473956716-5209-1-git-send-email-tgrabiec@scylladb.com>
2016-09-16 11:06:55 +01:00
Duarte Nunes
bc3cbb7009 thrift: Correctly detect clustering range wrap around
This patch uses the clustering bounds comparator to correctly detect
wrap around of a clustering range in the thrift handler.

Refs #1446

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1473938611-8590-1-git-send-email-duarte@scylladb.com>
2016-09-15 14:31:16 +01:00
Shlomi Livne
acb83073e2 ami: Fix instructions how to run scylla_io_setup on non ephemeral instances
On instances differenet then i2/m3/c3 we provide instructions to run
scylla_ip_setup. Running scylla_io_setup requires access to
/var/lib/scylla to crate a temporary file. To gain access to that
directory the user should run 'sudo scylla_io_setup'.

refs: #1645

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <4ce90ca1ba4da8f07cf8aa15e755675463a22933.1473935778.git.shlomi@scylladb.com>
2016-09-15 13:40:53 +03:00
Avi Kivity
d2b5f3ff44 Merge seastar upstream
* seastar e534401...40a68fa (1):
  > rpc: fix dangling reference in read_rcv_buf
2016-09-15 12:20:49 +03:00
Gleb Natapov
2e8b255741 Merge seastar upstream
* seastar 0303e0c...e534401 (6):
  > Merge "enable rpc to work on non contiguous memory for receive" from Gleb
  > install-dependencies.sh: install python3 for Ubuntu/Debian, which requires for configure.py
  > fix tcp stuck when output_stream write more than 212992 bytes once.
  > scripts/posix_net_conf.sh: supress 'ls: cannot access /sys/class/net/<NIC>/device/msi_irqs/' error message
  > scripts/posix_net_conf.sh: fix 'command not found' error when specifies --cpu-mask
  > native_network_stack: Fix use after free/missing wait in dhcp

Includes: "Remove utils::fragmented_input_stream and utils::input_stream in favor of seastar version" from Gleb.
2016-09-15 12:12:16 +03:00
Tomasz Grabiec
ed312c2b1a Merge remote-tracking branch 'duarte/comparator/v1'
From Duarte:

This patchset reuses the bound_view::comparator in range_tombstone to
correctly detect wrap around of a clustering range. This fixes a
manifestation of #1446 that results in wrong query results.

Introduced by b1f9688432

Fixes #1669
Refs #1446
2016-09-14 18:21:05 +02:00
Paweł Dziepak
bc2ff41003 cql3: fix units in large batch warning
When displaying a warning about batch being too large C* reports batch
size and limit in bytes while S* uses kB.

This patch switches Scylla to use bytes.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1473867171-18932-1-git-send-email-pdziepak@scylladb.com>
2016-09-14 18:38:46 +03:00
Takuya ASADA
647673195c dist/redhat/build_rpm.sh: add dependency for rpmbuild
Install rpmbuild when it's not installed yet.
Fixes #1651

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1473193430-14792-1-git-send-email-syuu@scylladb.com>
2016-09-14 14:57:55 +03:00
Calle Wilund
f126cf769a column_family: Ensure flush() waits for all previous flushes + self
Fixes #1577
Message-Id: <1472569952-4066-1-git-send-email-calle@scylladb.com>
2016-09-14 11:00:41 +01:00
Duarte Nunes
f864bca773 row_cache: Deal with side-effects in allocating_section
In row_cache::make_reader, we update statistics inside an
allocating_section, which retries the supplied function until it can
satisfy all allocations by way of reserving LSA memory up front. Since
those updates are interleave with allocations, retries can lead to
miscounts.

This patch fixes this by updating statistics after all allocations.

Fixes #1659

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1473845977-20205-1-git-send-email-duarte@scylladb.com>
2016-09-14 10:46:25 +01:00
Tomasz Grabiec
a498da1987 database: Ignore spaces in initial_token list
Currently we get boost::lexical_cast on startup if inital_token has a
list which contains spaces after commas, e.g.:

  initial_token: -1100081313741479381, -1104041856484663086, ...

Fixes #1664.
Message-Id: <1473840915-5682-1-git-send-email-tgrabiec@scylladb.com>
2016-09-14 11:58:13 +03:00
Paweł Dziepak
c220c676c8 types: honour end of sstring_view
There are several places in types.cc where we assume that sstring_view
range is null terminated. That may be not true and we should always use
either begin()/end() or data()/size() pairs.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-09-07 14:30:56 -07:00
Paweł Dziepak
6373289532 Merge "Adding slow query API" from Amnon
"This series adds an API for the slow query recording.

After this series it will be possible to set the/get the slow query
recording parameters."
2016-09-07 11:06:09 -07:00
Pekka Enberg
1095705a6b Update scylla-ami submodule
* dist/ami/files/scylla-ami 14c1666...e1e3919 (1):
  > scylla_ami_setup: remove scylla_cpuset_setup
2016-09-07 21:04:03 +03:00
Avi Kivity
7ac729b4d5 Merge "Optimize reads for clustered data" from Raphael
"This will be very important for read performance of time series use case,
where timestamp is usually stored as a clustering key, and the user asks
for specific data using a clustering range filter. Example:
    CREATE TABLE temperature (
        weatherstation_id text,
        event_time timestamp,
        temperature text,
        PRIMARY KEY (weatherstation_id,event_time)
    );
    ...
    SELECT * FROM temperature
        WHERE weatherstation_id='1234ABCD'
        AND event_time > '2013-04-03 07:01:00'
        AND event_time < '2013-04-03 07:04:00';

This is based on: https://issues.apache.org/jira/browse/CASSANDRA-5514

To check correctness, I wrote a dtest that runs scylla with row cache disabled,
creates several sstables with non overlapping clustering key ranges, queries
data using several clustering range filters, and checks that the database
returns the expected results.

Tested performance with a tool I wrote myself [1] and performance is indeed
improved by this patchset. This tool works as follow:
Scylla is started with row cache disabled. That's wanted here because we're
measuring a specific code that only gets executed if row cache misses the data
we asked for. Then Scylla is populated node with N sstables ('nodetool flush'
is used to ensure it), where each will have M clustering keys, totaling N*M
clustering keys. Finally, we will start asking for data using a clustering
range filter. The tool measures throughput and min/max/avg latency.

[1]: https://gist.github.com/raphaelsc/4c415f592aaed14a18be31279d225972

Follow the results:

BEFORE
-----
('Clustering keys / second: ', 747.9672111659951)
('Max latency (ms): ', 33)
('Min latency (ms): ', 12)
('Avg latency (ms): ', 13.0)
The operation took 13.3695700169 seconds

AFTER
-----
('Clustering keys / second: ', 3159.115303945648)
('Max latency (ms): ', 22)
('Min latency (ms): ', 2)
('Avg latency (ms): ', 3.0)
The operation took 3.16544318199 seconds

NOTE: Throughput and average latency are improved by a factor of ~4.
-----"
2016-09-04 15:06:32 +03:00
Amnon Heiman
11c687dd93 API: Add slow query logging implementation
This adds the implementation for the slow query logging API.

After this patch the following will be available:

curl -X GET  "http://localhost:10000/storage_service/slow_query"
curl -X POST
"http://localhost:10000/storage_service/slow_query?enable=true&ttl=10&threshold=6000"

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-09-03 01:15:22 +03:00
Amnon Heiman
ed1d02b1a3 API: Add slow query API definition
This adds the GET and POST api for slow query logging.

The GET return an object with the enable, ttl and threshold and the POST
lets you configure each of them.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-09-03 01:15:15 +03:00