Right now, we are using one stream_plan for each range of a column
family. This generates tons of stream_plans and stream_sessions. Each
stream_plan can transfer multiple ranges and column families. We can
use a single stream_plan to stream datas for multiple ranges and column
families, so that 1) overhead of stream_plan/session negotiation is
reduced 2) it is much easier to debug/monitor few stream_sessions
Fixes#1685
"When a node is decommissioned, its gossip state will not be removed from gossip
immediately. It will only be removed 3 days later which helps nodes that were
down when the node was decommissioned to know decommission later when they are
up again.
This series improves the logging to reduce confusion when a node tries to
talking to a decommissioned node. In addition, we now do not try to talk to the
decommissioned in the unreachable_endpoints gossip round.
Fixes#1615"
* tag 'asias/loggging_decommissioned_nodes/v1' of github.com:cloudius-systems/seastar-dev:
gossip: Make two log items debug level
gossip: Print node status when node is UP or DOWN
gossip: Ignore the node which is decommissioned in gossip round
gossip: Print convict debug info only when the node is alive
gossip: Add more timing log in add_expire_time_for_endpoint
streaming: Print on_remove and on_restart log when peer exists
streaming: Introduce has_peer in stream_manager
It is duplciated with "InetAddresss x.x.x.x is now UP" message.
INFO 2016-09-23 10:35:15,512 [shard 0] gossip - Node 127.0.0.1 has restarted, now UP, status = NORMAL
INFO 2016-09-23 10:35:15,513 [shard 0] gossip - InetAddress 127.0.0.1 is now UP, status = NORMAL
Make the log a bit cleaner.
For example:
gossip - InetAddress 127.0.0.4 is now UP, status = NORMAL
gossip - InetAddress 127.0.0.3 is now DOWN, status = LEFT
gossip - InetAddress 127.0.0.1 is now DOWN, status = shutdown
We print the following messages even if there is no stream_session with
that peer. It is a bit confusing.
INFO 2016-09-23 08:26:37,254 [shard 0] stream_session - stream_manager:
Close all stream_session with peer = 127.0.0.1 in on_restart
INFO 2016-09-23 08:26:37,287 [shard 0] stream_session - stream_manager:
Close all stream_session with peer = 127.0.0.3 in on_remove
Print only when the streaming session with the peer exists.
The fact that Seastar's semaphore has a default initializer of 1 if not
explicitly initialized is confusing and unexpected and recently lead to
two bugs. So ScyllaDB should not rely on this default behavior, and specify
the initial value of each semaphore explicitly.
In several cases in the ScyllaDB code, the explict initialization was
missing, and this patch adds it. In one case (rate_limiter) I even think
the default of 1 was a bit strange, and 0 makes more sense.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1474530745-23951-1-git-send-email-nyh@scylladb.com>
stdx::optional<T> uses quite elaborate std::enable_if_t magic to decide
whether the argument passed to its constructor should be used for a call
T constructor or stdx::optional<T> constructor.
Apparently, with GCC 6.2 having T constructor which accepts any type
confuses that magic and we end up with compile errors.
The solution is to have from_range() method that replaces that
constructor from range. There is also constructor that creates a key
from std::vector<bytes> so that code generated by IDL works as it did
before.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1474550971-15309-1-git-send-email-pdziepak@scylladb.com>
It is possible that endpoint_state_map does not contain the entry for
the node itself when collectd accesses it.
Fixes the issue:
Sep 18 11:33:16 XXX scylla[19483]: [shard 0] seastar - Exceptional
future ignored: std::out_of_range (_Map_base::at)
Fixes#1656
Message-Id: <8ffe22a542ff71e8c121b06ad62f94db54cc388f.1474377722.git.asias@scylladb.com>
Example:
(gdb) scylla ptr 0x601000480000
thread 1, large, LSA-managed
One can then use 'scylla lsa-segment 0x601000480000' to examine LSA
segment contents.
Benoît Canet points out that CQL messages are not always compressed
although compression is enabled by the driver. Turns out our CQL
compression negotiation is broken. We need to negotiate compression upon
STARTUP message and not rely on the incoming request to have the
compression bit enabled.
Fixes#1680
Message-Id: <1474366693-3001-1-git-send-email-penberg@scylladb.com>
The constructor was added in commit 7f3ce39 ("query_options: Add
constructor for batch mode options (multi-level)") but apparently it was
never actually implemented.
Spotted by CLion.
Message-Id: <1474303017-23383-1-git-send-email-penberg@scylladb.com>
The EXECUTE message encoding is different between CQL binary protocol
versions v1 and v2 (and later). Fix process_execute() to deserialize the
message as per the CQL binary protocol v1 specification:
Executes a prepared query. The body of the message must be:
<id><n><value_1>....<value_n><consistency>
where:
- <id> is the prepared query ID. It's the [short bytes] returned as a
response to a PREPARE message.
- <n> is a [short] indicating the number of following values.
- <value_1>...<value_n> are the [bytes] to use for bound variables in the
prepared query.
- <consistency> is the [consistency] level for the operation.
Fixes#1676
Message-Id: <1474287392-16792-1-git-send-email-penberg@scylladb.com>
leveled strategy uses heavily first and last decorated keys of a
sstable to get overlapping sstables in a given level. By storing
first and last decorated keys in sstable object, it's expected
that performance of leveled strategy (not compaction) will be
improved.
We will set first and last keys in sstable when either loading
or sealing it.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0abca819454ab4c088541bb49714f1f6a7dc4f42.1473959677.git.raphaelsc@scylladb.com>
timeuuid_type_impl::compare_bytes is a "trichotomic" comparator (-1,
0, 1) while less() is a "less" comparator (false, true). The code
incorrectly returns c1 instead of c1 < 0 which breaks the ordering.
Fixes#1196.
Message-Id: <1473956716-5209-1-git-send-email-tgrabiec@scylladb.com>
On instances differenet then i2/m3/c3 we provide instructions to run
scylla_ip_setup. Running scylla_io_setup requires access to
/var/lib/scylla to crate a temporary file. To gain access to that
directory the user should run 'sudo scylla_io_setup'.
refs: #1645
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <4ce90ca1ba4da8f07cf8aa15e755675463a22933.1473935778.git.shlomi@scylladb.com>
* seastar 0303e0c...e534401 (6):
> Merge "enable rpc to work on non contiguous memory for receive" from Gleb
> install-dependencies.sh: install python3 for Ubuntu/Debian, which requires for configure.py
> fix tcp stuck when output_stream write more than 212992 bytes once.
> scripts/posix_net_conf.sh: supress 'ls: cannot access /sys/class/net/<NIC>/device/msi_irqs/' error message
> scripts/posix_net_conf.sh: fix 'command not found' error when specifies --cpu-mask
> native_network_stack: Fix use after free/missing wait in dhcp
Includes: "Remove utils::fragmented_input_stream and utils::input_stream in favor of seastar version" from Gleb.
From Duarte:
This patchset reuses the bound_view::comparator in range_tombstone to
correctly detect wrap around of a clustering range. This fixes a
manifestation of #1446 that results in wrong query results.
Introduced by b1f9688432Fixes#1669
Refs #1446
In row_cache::make_reader, we update statistics inside an
allocating_section, which retries the supplied function until it can
satisfy all allocations by way of reserving LSA memory up front. Since
those updates are interleave with allocations, retries can lead to
miscounts.
This patch fixes this by updating statistics after all allocations.
Fixes#1659
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1473845977-20205-1-git-send-email-duarte@scylladb.com>
Currently we get boost::lexical_cast on startup if inital_token has a
list which contains spaces after commas, e.g.:
initial_token: -1100081313741479381, -1104041856484663086, ...
Fixes#1664.
Message-Id: <1473840915-5682-1-git-send-email-tgrabiec@scylladb.com>
There are several places in types.cc where we assume that sstring_view
range is null terminated. That may be not true and we should always use
either begin()/end() or data()/size() pairs.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
"This will be very important for read performance of time series use case,
where timestamp is usually stored as a clustering key, and the user asks
for specific data using a clustering range filter. Example:
CREATE TABLE temperature (
weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time)
);
...
SELECT * FROM temperature
WHERE weatherstation_id='1234ABCD'
AND event_time > '2013-04-03 07:01:00'
AND event_time < '2013-04-03 07:04:00';
This is based on: https://issues.apache.org/jira/browse/CASSANDRA-5514
To check correctness, I wrote a dtest that runs scylla with row cache disabled,
creates several sstables with non overlapping clustering key ranges, queries
data using several clustering range filters, and checks that the database
returns the expected results.
Tested performance with a tool I wrote myself [1] and performance is indeed
improved by this patchset. This tool works as follow:
Scylla is started with row cache disabled. That's wanted here because we're
measuring a specific code that only gets executed if row cache misses the data
we asked for. Then Scylla is populated node with N sstables ('nodetool flush'
is used to ensure it), where each will have M clustering keys, totaling N*M
clustering keys. Finally, we will start asking for data using a clustering
range filter. The tool measures throughput and min/max/avg latency.
[1]: https://gist.github.com/raphaelsc/4c415f592aaed14a18be31279d225972
Follow the results:
BEFORE
-----
('Clustering keys / second: ', 747.9672111659951)
('Max latency (ms): ', 33)
('Min latency (ms): ', 12)
('Avg latency (ms): ', 13.0)
The operation took 13.3695700169 seconds
AFTER
-----
('Clustering keys / second: ', 3159.115303945648)
('Max latency (ms): ', 22)
('Min latency (ms): ', 2)
('Avg latency (ms): ', 3.0)
The operation took 3.16544318199 seconds
NOTE: Throughput and average latency are improved by a factor of ~4.
-----"
This adds the GET and POST api for slow query logging.
The GET return an object with the enable, ttl and threshold and the POST
lets you configure each of them.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>