Commit Graph

7930 Commits

Author SHA1 Message Date
Tomasz Grabiec
f59ec59abc mutation: Implement upgrade()
Converts mutation to a new schema.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
0edfe138f8 mutation_partition_view: Make visitable also with column_mapping 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
2cfdfe261d Introduce converting_mutation_partition_applier 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
b17cbc23ab schema: Introduce column_mapping
Encapsulates information needed to convert mutation representations
between schema versions.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
9a3db10b85 db/serializer: Implement skip() for bytes and sstring 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
13974234a4 db/serializer: Spread serializers to relax header dependencies 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
d13c6d7008 types: Introduce is_atomic()
Matches column_definition::is_atomic()
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
f3556ebfc2 schema: Introduce column_count_type
Right now in some places we use column_id, and in some places
size_t. Solve it by using column_count_type whose meaning is "an
integer sufficiently large for indexing columns". Note that we cannot
use column_id because it has more meaning to it than that.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
f58c2dec1e schema: Make schema objects versioned
The version needs to change value not only on structural changes but
also temporal. This is needed for nodes to detect if the version they
see was already synchronized with or not even if it has the same
structure as the past versions. We also need to end up with the same
version on all nodes when schema changes are commuted.

For regular mutable schemas version will be calculated from underlying
mutations when schema is announced. For static schemas of system
keyspace it is calculated by hashing scylla version and column id,
because we don't have mutations at the time of building the schema.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
13295563e0 schema_builder: Move compact_storage setting outside build()
Properties of the schema are set using methods of schema_builder and
different variants of build() are for different forms of the final
schema object.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
dbb7b7ebe3 db: Move system keyspace initialization to init_system_keyspace() 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
fdb9e01eb4 schema_tables: Use schema_mutations for schema_ptr translations
We will be able to reuse the code in frozen_schema. We need to read
data in mutation form so that we can construct the correct
schema_table_version, and attach the mutations to schema_ptr.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
d07e32bc32 schema_tables: Simplify schema building invocation chain 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
3c3ea20640 schema_tables: Drop pkey parameter from add_table_to_schema_mutation()
It simplifies add_table_to_schema_mutation() interface.

The current code is also a bit confusing, partition_key is created
with the keyspaces() schema and used in mutations destined for the
columnfamilies() schema. It works, the types are the same, but looks a
bit scary.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
22254e94cc query::result_set: Add constructor from mutation 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
a861b74b7e Introduce schema_mutations 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
a6084ee007 mutation: Make hashable
The computed hash is independent of any internal representation thus
can be used as a digest across nodes and versions.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
c009fe5991 keys: Add missing clustering_key_prefix_view::get_compound_type() 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
ade5cf1b4b mutation_partition: Make visitable with mutation_partition_visitor 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
bc9ee083dd db: Move atomic_cell_or_collection to separate header
To break future cyclic dependency:

  atomic_cell.hh -> schema.hh (new) -> types.hh -> atomic_cell.hh
2016-01-08 21:10:25 +01:00
Tomasz Grabiec
6f955e1290 mutation_partition: Make equal() work with different schemas 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
75caba5b8a schema: Guarantee that column id order matches name order
For static and regular (row) columns it is very convenient in some
cases to utilize the fact that columns ordered by ids are also ordered
by name. It currently holds, so make schema export this guarantee and
enable consumers to rely on.

The static schema::row_column_ids_are_ordered_by_name field is about
allowing code external to schema to make it very explicit (via
static_assert) that it relies on this guarantee, and be easily
discoverable in case we would have to relax this.
2016-01-08 21:10:25 +01:00
Tomasz Grabiec
14d0482efa Introduce md5_hasher 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
eb1b21eb4b Introduce hashing helpers 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
ff3a2e1239 mutation_partition: Drop row tombstones in do_compact() 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
eb9b383531 service: migration_manager: Fix announce order to match C*
Current logic differs from C*, we first push to other nodes and then
initiate the the sync locally, while C* does the opposite.
2016-01-08 21:10:25 +01:00
Tomasz Grabiec
0768deba74 query_processor: Add trace-level logging of processed statements 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
dae531554a create_index_statement: Use textual column name in all messages
As pointed out by Pawel, we can rely on operator<<()
Message-Id: <1452243656-3376-1-git-send-email-tgrabiec@scylladb.com>
2016-01-08 11:06:09 +02:00
Tomasz Grabiec
5d6d039297 create_index_statement: Use textual representation of column name
Before:

  InvalidRequest: code=2200 [Invalid query] message="No column definition found for column 736368656d615f76657273696f6e"

After:

  InvalidRequest: code=2200 [Invalid query] message="No column definition found for column schema_version"
Message-Id: <1452243156-2923-1-git-send-email-tgrabiec@scylladb.com>
2016-01-08 10:53:37 +02:00
Avi Kivity
0c755d2c94 db: reduce log spam when ignoring an sstable
With 10 sstables/shard and 50 shards, we get ~10*50*50 messages = 25,000
log messages about sstables being ignored.  This is not reasonable.

Reduce the log level to debug, and move the message to database.cc,
because at its original location, the containing function has nothing to
do with the message itself.

Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Message-Id: <1452181687-7665-1-git-send-email-avi@scylladb.com>
2016-01-07 19:23:25 +02:00
Avi Kivity
3377739fa3 main: wait for API http server to start
Wait for the future returned by the http server start process to resolve,
so we know it is started.  If it doesn't, we'll hit the or_terminate()
further down the line and exit with an error code.
Message-Id: <1452092806-11508-3-git-send-email-avi@scylladb.com>
2016-01-07 16:44:07 +02:00
Avi Kivity
fbe3283816 snitch: intentionally leak snitch singleton
Because our shutdown process is crippled (refs #293), we won't shutdown the
snitch correctly, and the sharded<> instance can assert during shutdown.
This interferes with the next patch, which adds orderly shutdown if the http
server fails to start.

Leak it intentionally to work around the problem.
Message-Id: <1452092806-11508-2-git-send-email-avi@scylladb.com>
2016-01-07 16:43:37 +02:00
Pekka Enberg
973c62a486 gms/gossiper: Fix compilation error
Commit 02b04e5 ("gossip: Add is_safe_for_bootstrap") needs on extra
curly bracket to compile.
Message-Id: <1452177529-13555-1-git-send-email-penberg@scylladb.com>
2016-01-07 16:42:55 +02:00
Vlad Zolotarov
07f8549683 database: filter out a manifest.json files
Filter out manifest.json files when reading sstables during
bootup and when loading new sstables ('nodetool refresh').

Fixes issue #529

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1451911734-26511-3-git-send-email-vladz@cloudius-systems.com>
2016-01-07 15:56:02 +02:00
Vlad Zolotarov
c5aa2d6f1a database: lister: add a filtering option
Add a possibility to pass a filter functor receiving a full path
to a directory entry and returning a boolean value: TRUE if an
entry should be enumerated and FALSE - if it should be filtered out.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1451911734-26511-2-git-send-email-vladz@cloudius-systems.com>
2016-01-07 15:56:01 +02:00
Asias He
02b04e5907 gossip: Add is_safe_for_bootstrap
Make the following tests pass:

bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test
bootstrap_test.py:TestBootstrap.killed_wiped_node_cannot_join_test

    1) start node2
    2) wait for cql connection with node2 is ready
    3) stop node2
    4) delete data and commitlog directory for node2
    5) start node2

In step 5), node2 will do the bootstrap process since its data,
including the system table is wiped. It will think itself is a completly
new node and can possiblly stream from wrong node and violate
consistency.

To fix, we reject the boot if we found the node was in SHUTDOWN or
STATUS_NORMAL.

CASSANDRA-9765
Message-Id: <47bc23f4ce1487a60c5b4fbe5bfe9514337480a8.1452158975.git.asias@scylladb.com>
2016-01-07 15:55:01 +02:00
Asias He
933614bdf9 main: Change API server starting message
It comes from the Seastar HTTP server and is inaccurate.

Message-Id: <6a634437d2bd4368400010e25969e215894c2df9.1452162686.git.asias@scylladb.com>
2016-01-07 15:53:28 +02:00
Asias He
6439f4d808 storage_service: Fix load_broadcaster in get_load_map
If get_load_map is called from the api while load_broadcaster is not
set yet, we dereference nullptr.

Fixes #763.
Message-Id: <6f8d554f4976aea85d5cec5a76a3848234138b0a.1452152148.git.asias@scylladb.com>
2016-01-07 10:36:36 +02:00
Asias He
2345cda42f messaging_service: Rename shard_id to msg_addr
Use shard_id as the destination of the messaging_service is confusing,
since shard_id is used in the context of cpu id.
Message-Id: <8c9ef193dc000ef06f8879e6a01df65cf24635d8.1452155241.git.asias@scylladb.com>
2016-01-07 10:36:35 +02:00
Asias He
8c909122a6 gossip: Add wait_for_gossip_to_settle
Implement the wait for gossip to settle logic in the bootup process.

CASSANDRA-4288

Fixes:
bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test

1) start node2
2) wait for cql connection with node2 is ready
3) stop node2
4) delete data and commitlog directory for node2
5) start node2

In step 5, sometimes I saw in shadow round of node2, it gets node2's
status as BOOT from other nodes in the cluster instead of NORMAL. The
problem is we do not wait for gossip to settle before we start cql server,
as a result, when we stop node2 in step 3), other nodes in the cluster
have not got node2's status update to NORMAL.
2016-01-07 10:09:25 +02:00
Benoît Canet
8f725256e1 config: Mark ssl_storage_port as Used
Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1452082041-6117-1-git-send-email-benoit@scylladb.com>
2016-01-06 17:34:53 +02:00
Benoît Canet
e80c8b6130 config: Mark previously unused SSL client/server options as used
The previous SSL enablement patches do make uses of these
options but they are still marked as Unused.
Change this and also update the db/config.hh documentation
accordingly.

Syntax is now:

client_encryption_options:
   enabled: true
   certificate: <path-to-PEM-x509-cert> (default conf/scylla.crt)
   keyfile: <path-to-PEM-x509-key> (default conf/scylla.key)

Fixes: #756.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1452032073-6933-1-git-send-email-benoit@scylladb.com>
2016-01-06 10:32:53 +02:00
Tomasz Grabiec
9d71e4a7eb Merge branch 'fix_to_issue_676_v4' from git@github.com:raphaelsc/scylla.git
Compaction fixes from Raphael:

There were two problems causing issue 676:
1) max_purgeable was being miscalculated (fixed by b7d36af).
2) empty row not being removed by mutation_partition::do_compact
Testcase is added to make sure that a tombstone will be purged under
certain conditions.
2016-01-05 15:19:22 +01:00
Raphael S. Carvalho
a81b660c0d tests: check that tombstone is purged under certain conditions
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-05 15:19:21 +01:00
Raphael S. Carvalho
03eee06784 remove empty rows in mutation_partition::do_compact
do_compact() wasn't removing an empty row that is covered by a
tombstone. As a result, an empty partition could be written to a
sstable. To solve this problem, let's make trim_rows remove a
row that is considered to be empty. A row is empty if it has no
tombstone, no marker and no cells.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-05 15:19:21 +01:00
Pekka Enberg
800ed6376a Merge "Repair overhaul" from Nadav
"This is another version of the repair overhaul, to avoid streaming *all* the
 data between nodes by sending checksums of token ranges and only streaming
 ranges which contain differing data."
2016-01-05 16:05:44 +02:00
Pekka Enberg
f4bdec4d09 Merge "Support for deleting all snapshots" from Vlad
"Add support for deleting all snapshots of all keyspaces."

Fixes #639.
2016-01-05 15:42:44 +02:00
Nadav Har'El
f90e1c1548 repair: support "hosts" and "dataCenters" parameters
Support the "hosts" and "dataCenters" parameters of repair. The first
specifies the known good hosts to repair this host from (plus this host),
and the second asks to restrict the repair to the local data center (you
must issue the repair to a node in the data center you want to repair -
issuing the command to a data center other than the named one returns
an error).

For example these options are used by nodetool commands like:
nodetool repair -hosts 127.0.0.1,127.0.0.2 keyspace
nodetool repair -dc datacenter1

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-01-05 15:38:40 +02:00
Nadav Har'El
ac4e86d861 repair: use repair_checksum_range
The existing repair code always streamed the entire content of the
database. In this overhaul, we send "repair_checksum_range" messages to
the other nodes to verify whether they have exactly the same data as
this node, and if they do, we avoid streaming the identical code.

We make an attempt to split the token ranges up to contain an estimated
100 keys each, and send these ranges' checksums. Future versions of this
code will need to improve this estimation (and make this "100" a parameter)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-01-05 15:38:40 +02:00
Nadav Har'El
9e65ecf983 repair: convenience function for syncing a range
This patch adds a function sync_range() for synchronizing all partitions
in a given token range between a set of replicas (this node and a list of
neighbors).
Repair will call this function once it has decided that the data the
replicas hold in this range is not identical.

The implementation streams all the data in the given range, from each of
the neighbors to this node - so now this node contains the most up-to-date
data. It then streams the resulting data back to all the neighbors.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-01-05 15:38:40 +02:00