Right now in some places we use column_id, and in some places
size_t. Solve it by using column_count_type whose meaning is "an
integer sufficiently large for indexing columns". Note that we cannot
use column_id because it has more meaning to it than that.
The version needs to change value not only on structural changes but
also temporal. This is needed for nodes to detect if the version they
see was already synchronized with or not even if it has the same
structure as the past versions. We also need to end up with the same
version on all nodes when schema changes are commuted.
For regular mutable schemas version will be calculated from underlying
mutations when schema is announced. For static schemas of system
keyspace it is calculated by hashing scylla version and column id,
because we don't have mutations at the time of building the schema.
We will be able to reuse the code in frozen_schema. We need to read
data in mutation form so that we can construct the correct
schema_table_version, and attach the mutations to schema_ptr.
It simplifies add_table_to_schema_mutation() interface.
The current code is also a bit confusing, partition_key is created
with the keyspaces() schema and used in mutations destined for the
columnfamilies() schema. It works, the types are the same, but looks a
bit scary.
For static and regular (row) columns it is very convenient in some
cases to utilize the fact that columns ordered by ids are also ordered
by name. It currently holds, so make schema export this guarantee and
enable consumers to rely on.
The static schema::row_column_ids_are_ordered_by_name field is about
allowing code external to schema to make it very explicit (via
static_assert) that it relies on this guarantee, and be easily
discoverable in case we would have to relax this.
With 10 sstables/shard and 50 shards, we get ~10*50*50 messages = 25,000
log messages about sstables being ignored. This is not reasonable.
Reduce the log level to debug, and move the message to database.cc,
because at its original location, the containing function has nothing to
do with the message itself.
Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Message-Id: <1452181687-7665-1-git-send-email-avi@scylladb.com>
Wait for the future returned by the http server start process to resolve,
so we know it is started. If it doesn't, we'll hit the or_terminate()
further down the line and exit with an error code.
Message-Id: <1452092806-11508-3-git-send-email-avi@scylladb.com>
Because our shutdown process is crippled (refs #293), we won't shutdown the
snitch correctly, and the sharded<> instance can assert during shutdown.
This interferes with the next patch, which adds orderly shutdown if the http
server fails to start.
Leak it intentionally to work around the problem.
Message-Id: <1452092806-11508-2-git-send-email-avi@scylladb.com>
Make the following tests pass:
bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test
bootstrap_test.py:TestBootstrap.killed_wiped_node_cannot_join_test
1) start node2
2) wait for cql connection with node2 is ready
3) stop node2
4) delete data and commitlog directory for node2
5) start node2
In step 5), node2 will do the bootstrap process since its data,
including the system table is wiped. It will think itself is a completly
new node and can possiblly stream from wrong node and violate
consistency.
To fix, we reject the boot if we found the node was in SHUTDOWN or
STATUS_NORMAL.
CASSANDRA-9765
Message-Id: <47bc23f4ce1487a60c5b4fbe5bfe9514337480a8.1452158975.git.asias@scylladb.com>
Implement the wait for gossip to settle logic in the bootup process.
CASSANDRA-4288
Fixes:
bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test
1) start node2
2) wait for cql connection with node2 is ready
3) stop node2
4) delete data and commitlog directory for node2
5) start node2
In step 5, sometimes I saw in shadow round of node2, it gets node2's
status as BOOT from other nodes in the cluster instead of NORMAL. The
problem is we do not wait for gossip to settle before we start cql server,
as a result, when we stop node2 in step 3), other nodes in the cluster
have not got node2's status update to NORMAL.
The previous SSL enablement patches do make uses of these
options but they are still marked as Unused.
Change this and also update the db/config.hh documentation
accordingly.
Syntax is now:
client_encryption_options:
enabled: true
certificate: <path-to-PEM-x509-cert> (default conf/scylla.crt)
keyfile: <path-to-PEM-x509-key> (default conf/scylla.key)
Fixes: #756.
Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1452032073-6933-1-git-send-email-benoit@scylladb.com>
Compaction fixes from Raphael:
There were two problems causing issue 676:
1) max_purgeable was being miscalculated (fixed by b7d36af).
2) empty row not being removed by mutation_partition::do_compact
Testcase is added to make sure that a tombstone will be purged under
certain conditions.
do_compact() wasn't removing an empty row that is covered by a
tombstone. As a result, an empty partition could be written to a
sstable. To solve this problem, let's make trim_rows remove a
row that is considered to be empty. A row is empty if it has no
tombstone, no marker and no cells.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
"This is another version of the repair overhaul, to avoid streaming *all* the
data between nodes by sending checksums of token ranges and only streaming
ranges which contain differing data."
Support the "hosts" and "dataCenters" parameters of repair. The first
specifies the known good hosts to repair this host from (plus this host),
and the second asks to restrict the repair to the local data center (you
must issue the repair to a node in the data center you want to repair -
issuing the command to a data center other than the named one returns
an error).
For example these options are used by nodetool commands like:
nodetool repair -hosts 127.0.0.1,127.0.0.2 keyspace
nodetool repair -dc datacenter1
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The existing repair code always streamed the entire content of the
database. In this overhaul, we send "repair_checksum_range" messages to
the other nodes to verify whether they have exactly the same data as
this node, and if they do, we avoid streaming the identical code.
We make an attempt to split the token ranges up to contain an estimated
100 keys each, and send these ranges' checksums. Future versions of this
code will need to improve this estimation (and make this "100" a parameter)
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch adds a function sync_range() for synchronizing all partitions
in a given token range between a set of replicas (this node and a list of
neighbors).
Repair will call this function once it has decided that the data the
replicas hold in this range is not identical.
The implementation streams all the data in the given range, from each of
the neighbors to this node - so now this node contains the most up-to-date
data. It then streams the resulting data back to all the neighbors.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>