add_local_application_state is used in various places. Before this
patch, it can only be called on cpu zero. To make it safer to use, use
invoke_on() to foward the code to run on cpu zero, so that caller can
call it on any cpu.
Refs: #795
Message-Id: <d69b81c5561622078dbe887d87209c4ea2e3bf46.1456315043.git.asias@scylladb.com>
Gleb saw once:
scylla: gms/gossiper.cc:1393:
gms::gossiper::add_local_application_state(gms::application_state,
gms::versioned_value):: mutable: Assertion
`endpoint_state_map.count(ep_addr)' failed.
The assert is about we can not find the entry in endpoint_state_map of
the node itself. I can not really find any place we could call
add_local_application_state before we call gossiper::start_gossiping()
where it inserts broadcast address into endpoint_state_map.
I can not reproduce issue, let's log the error so we can narrow down
which application state triggered the assert.
Refs: #795
Message-Id: <f4433be0a0d4f23470a5e24e528afdb67b74c7ef.1456315043.git.asias@scylladb.com>
Fixes#934 - faulty assert in discard_sstables
run_with_compaction_disabled clears out a CF from compaction
mananger queue. discard_sstables wants to assert on this, but looks
at the wrong counters.
pending_compactions is an indicator on how much interested parties
want a CF compacted (again and again). It should not be considered
an indicator of compactions actually being done.
This modifies the usage slightly so that:
1.) The counter is always incremented, even if compaction is disallowed.
The counters value on end of run_with_compaction_disabled is then
instead used as an indicator as to whether a compaction should be
re-triggered. (If compactions finished, it will be zero)
2.) Document the use and purpose of the pending counter, and add
method to re-add CF to compaction for r_w_c_d above.
3.) discard_sstables now asserts on the right things.
Message-Id: <1456332824-23349-1-git-send-email-calle@scylladb.com>
Fixes#937
In fixing #884, truncation not truncating memtables properly,
time stamping in truncate was made shard-local. This however
breaks the snapshot logic, since for all shards in a truncate,
the sstables should snapshot to the same location.
This patch adds a required function argument to truncate (and
by extension drop_column_family) that produces a time stamp in
a "join" fashion (i.e. same on all shards), and utilizes the
joinpoint type in caller to do so.
Message-Id: <1456332856-23395-2-git-send-email-calle@scylladb.com>
Lets operations working on all shards "join" and acquire
the same value of something, with that value being based on
whenever all shards reach the join.
Obvious use case: time stamp after one set of per-shard ops, but
before final ones.
The generation of the value is guaranteed to happen on the shards
that created the join point.
Based on the join-ops in CF::snapshot, but abstracted and made
caller responsibility. Primary use case is to help deal with
the join-problem of truncation.
Message-Id: <1456332856-23395-1-git-send-email-calle@scylladb.com>
The gocql driver assumes that there's a result metadata section in the
PREPARED message. Technically, Scylla is not at fault here as the CQL
specification explicitly states in Section 4.2.5.4. ("Prepared") that the
section may be empty:
- <result_metadata> is defined exactly as <metadata> but correspond to the
metadata for the resultSet that execute this query will yield. Note that
<result_metadata> may be empty (have the No_metadata flag and 0 columns, See
section 4.2.5.2) and will be for any query that is not a Select. There is
in fact never a guarantee that this will non-empty so client should protect
themselves accordingly. The presence of this information is an
However, Cassandra always populates the section so lets do that as well.
Fixes#912.
Message-Id: <1456317082-31688-1-git-send-email-penberg@scylladb.com>
* dist/ami/files/scylla-ami 398b1aa...d4a0e18 (3):
> Sort service running order (scylla-ami-setup.service -> scylla-io-setup.service -> scylla-server.service)
> Drop --ami and --disk-count parameters
> dist: pass the number of disks to set io params
In each gossip round, i.e., gossiper::run(), we do:
1) send syn message
2) peer node: receive syn message, send back ack message
3) process ack message in handle_ack_msg
apply_state_locally
mark_alive
send_gossip_echo
handle_major_state_change
on_restart
mark_alive
send_gossip_echo
mark_dead
on_dead
on_join
apply_new_states
do_on_change_notifications
on_change
4) send back ack2 message
5) peer node: process ack2 message
apply_state_locally
At the moment, syn is "wait" message, it times out in 3 seconds. In step
3, all the registered gossip callbacks are called which might take
significant amount of time to complete.
In order to reduce the gossip round latency, we make syn "no-wait" and
do not run the handle_ack_msg insdie the gossip::run(). As a result, we
will not get a ack message as the return value of a syn message any
more, so a GOSSIP_DIGEST_ACK message verb is introduced.
With this patch, the gossip message exchange is now async. It is useful
when some nodes are down in the cluster. We will not delay the gossip
round, which is supposed to run every second, 3*n seconds (n = 1-3,
since it talks to 1-3 peer nodes in each gossip round) or even
longer (considering the time to run gossip callbacks).
Later, we can make talking to the 1-3 peer nodes in parallel to reduce
latency even more.
Refs: #900
We will soon switch to use no-wait message for gossip. GOSSIP_DIGEST_SYN
will no longer return GOSSIP_DIGEST_ACK message. So we need a standalone
verb for GOSSIP_DIGEST_ACK.
The problem is we initialize _last_interpret when failure_detector
object is constructed. When interpret() runs for the first time, the
_last_interpret value is not the last time we run interpret() but the
time we initialize failure_detector object.
Fix by initializing _last_interpret inside interpret().
[Thu Feb 18 02:40:04 2016] INFO [shard 0] storage_service - Node 127.0.0.1 state jump to normal
[Thu Feb 18 02:40:04 2016] INFO [shard 0] storage_service - NORMAL: node is now in normal status
[Thu Feb 18 02:40:04 2016] INFO [shard 0] gossip - Waiting for gossip to settle before accepting client requests...
[Thu Feb 18 02:40:12 2016] INFO [shard 0] gossip - No gossip backlog; proceeding
Starting listening for CQL clients on 127.0.0.1:9042...
[Thu Feb 18 02:40:12 2016] INFO [shard 0] gossip - Node 127.0.0.2 is now part of the cluster
[Thu Feb 18 02:40:12 2016] INFO [shard 0] gossip - InetAddress 127.0.0.2 is now UP
[Thu Feb 18 02:40:13 2016] INFO [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2
[Thu Feb 18 02:40:13 2016] WARN [shard 0] failure_detector - Not marking nodes down due to local pause of 9091 > 5000 (milliseconds)
"Add scylla-io-setup.service to configure max-io-requests and num-io-queues on first boot.
Moved SCYLLA_IO configuration code from scylla_sysconfig_setup to scylla-io-setup.service, revert commits related it.
On scylla-io-setup.service, autodetect Amazon EC2 instead of using AMI variable on sysconfig."
While serialization vector it is sometimes required to rollback some of
the serialized elements.
vector_position is the equivalent to the bytes_ostream position struct.
It holds information about the current position in a serialized vector,
the position in the bufffer and the current number of elements
serialized.
It will allow to rollback to the current point.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1456041750-1505-2-git-send-email-amnon@scylladb.com>
Fixes#927.
The new visiting code builds cell instances using
atomic_cell::make_*() factory methods, which won't work in LSA context
because they depend on managed_bytes storage to be linearized. It may
not be since large blob support. This worked before because we created
cells from views before which works in all contexts.
Fix by constructing them in standard allocator context.
Message-Id: <1456234064-13608-2-git-send-email-tgrabiec@scylladb.com>
The line:
boost::apply_visitor(atomic_cell_or_collection_visitor(std::move(visitor), id, col), cell);
is executed in a loop, so visitor could be used after being
moved-from. This may not always be allowed for some visitors. Also,
vistors may keep state, which should be preserved for the whole
visitation.
This doesn't fix any issue right now.
Message-Id: <1456234064-13608-1-git-send-email-tgrabiec@scylladb.com>
The 'clear' function explicitly clears the screen and repaints it which
causes really annoying flicker. Use 'erase' to make scyllatop more
pleasant on the eyes.
Message-Id: <1456229348-2194-1-git-send-email-penberg@scylladb.com>
It is often the case that the there is useful debugging information
printed by the test before it hangs. It is annoying to see just "TIMED
OUT" in jenkins. Print the output always when it is available.
In addition to that, we should not interpret all exceptions thrown
from communicate() as timeouts. For example, currently ^C sent to the
script misleadingly results in "TIMED OUT" to be printed.
Message-Id: <1456174992-21909-1-git-send-email-tgrabiec@scylladb.com>
Every native scalar function is already tagged whether they're pure or
not but because we don't implement the is_pure() function, all functions
end up being advertised as pure. This means that functions like now()
that are *not* pure, end up being evaluated only once.
Fixes#571.
Message-Id: <1456227171-461-1-git-send-email-penberg@scylladb.com>
In same cases we may have a lot of empty partitions whose tombstones
expired, and there is no point in including them in the results.
This was found to cause performance issues for workloads using batch
updates. system.batchlog table would accumulate a lot of deletes over
time. It has gc_grace_seconds set to 0 so most of the tombstones would
be expired. mutation queries done by batchlog manager were still
returning all partitions present in memtables which caused mutation
queries result to be inflated. This in turn was causing
mutation_result_merger to take a long time to process them.
We don't support the 'CREATE TYPE' statement for now. The user-visible
error message, however, is unreadable because our CQL parser doesn't
even recognize the statement.
cqlsh:ks1> CREATE TYPE config (url text);
SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] message=" : cannot match to any predicted input...
Implement just enough of 'CREATE TYPE' parsing to be able to report a
human readable error message if someone tries to execute such
statements:
cqlsh:ks1> CREATE TYPE config (url text);
ServerError: <ErrorMessage code=0000 [Server error] message="User-defined types are not supported yet">
Message-Id: <1456148719-9473-2-git-send-email-penberg@scylladb.com>