We are currently failing the sstable test. The reason is that we use the store()
function for test purposes, and that function does not store the TOC component.
It was removed by Aviccident in 3a5e3c88.
Because that function is only used for testing purposes, it doesn't need to write
the Index and Data components: we can then remove them from the list.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
When probing for the type, I have made the classical mistake of using
as a parameter part of a structure that is moved into the capture. That
is what broke our tests.
But also, when stat'ing, de.name will give us only the component relative to
the current path. We need to add the directory so the stat will succeed.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Our directory scanner currently requires a type to be passed, and we have a
FIXME saying that we should stat when there is none. In some filesystems,
in particular, XFS, getdents won't return a type, meaning we should manually
probe it.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
The code is merge_tables() is a twisted maze of tricks that is hard to
restructure so that event notification can be done cleanly like with
keyspaces.
The problem there is that we need to run bunch of database operations
for the merging that really need to happen on all the shards. To fix
the issue, lets cheat a little and simply only run CQL event
notification on cpu zero.
This seems to fix cluster schema propagation issues in urchin-dtest. I
can now run TestSimpleCluster.simple_create_insert_select_test without
any additional delays inserted into the test code.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
* seastar c09488e...6f1dd3c (3):
> net: make udp send more robust wrt. errors
> net: remove packet constructors with template Deleter parameter
> memory: Support for discovering allocator's address range
We also need to encode the event type in the response message. Fixes the
following dtest breakage:
cassandra.connection: ERROR: Error decoding response from Cassandra. opcode: 000c; message contents: '\x83\x00\xff\xff\x0c\x00\x00\x00\x17\x00\x07CREATED\x00\x08KEYSPACE\x00\x02ks'
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/cassandra/connection.py", line 431, in process_msg
flags, opcode, body, self.decompressor)
File "/usr/lib64/python2.7/site-packages/cassandra/protocol.py", line 123, in decode_response
msg = msg_class.recv_body(body, protocol_version, user_type_map)
File "/usr/lib64/python2.7/site-packages/cassandra/protocol.py", line 803, in recv_body
raise NotSupportedError('Unknown event type %r' % event_type)
NotSupportedError: Unknown event type u'CREATED'
Reported-by: Shlomi Livne <shlomi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
When forcing a compaction on a column family with no sstables, an
assert will fail because there is no sstables to be compacted.
This problem is fixed by ignoring a compaction request when no
sstable is provided.
Fixes#61.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
We already have all_tables() function converted and there's really no
use for compile() unless we switch to using CQL to create the schema
tables.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
There's nothing legacy about it so rename legacy_schema_tables to
schema_tables. The naming comes from a Cassandra 3.x development branch
which is not relevant for us in the near future.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Remove commented out isReadyForBoostrap. We don't have a StageManager
nor we will so drop the function.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Rename "MIGRATION_DELAY_IN_MSEC" to "migration_delay" as the unit of
time is already clear from the type.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Use get_storage_proxy() and get_local_storage_proxy() helpers under the
hood to simplify migration manager API users.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
This patch adds the beginning of node repair support. Repair is initiated
on a node using the REST API, for example to repair all the column families
in the "try1" keyspace, you can use:
curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1"
I tested that the repair already works (exchanges mutations with all other
replicas, and successfully repairs them), so I think can be committed,
but will need more work to be completed
1. Repair options are not yet supported (range repair, sequential/parallel
repair, choice of hosts, datacenters and column families, etc.).
2. *All* the data of the keyspace is exchanged - Merkle Trees (or an
alternative optimization) and partial data exchange haven't been
implemented yet.
3. Full repair for nodes with multiple separate ranges is not yet
implemented correctly. E.g., consider 10 nodes with vnodes and RF=2,
so each vnode's range has a different host as a replica, so we need
to exchange each key range separately with a different remote host.
4. Our repair operation returns a numeric operation id (like Origin),
but we don't yet provide any means to use this id to check on ongoing
repairs like Origin allows.
5. Error hangling, logging, etc., needs to be improved.
6. SMP nodes (with multiple shards) should work correctly (thanks to
Asias's latest patch for SMP mutation streaming) but haven't been
tested.
7. Incremental repair is not supported (see
http://www.datastax.com/dev/blog/more-efficient-repairs)
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
"This series implements initial support for CQL events. We introduce
migration_listener hook in migration manager as well as event notifier
in the CQL server that's built on top of it to send out the events via
CQL binary protocol. We also wire up create keyspace events to the
system so subscribed clients are notified when a new keyspace is
created.
There's still more work to be done to support all the events. That
requires some work to restructure existing code so it's better to merge
this initial series now and avoid future code conflicts."
Add a create_keyspace_on_all() helper which is needed for sending just
one event notification per created keyspace, not one per shard.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
* seastar 947619e...6de00be (4):
> net: prevent tcp from fragmenting packet headers
> net: use malloc() in internal packet allocations
> core/memory: Fix compilation of debug-mode version of stats()
> memory: Expose more statistics over collectd
We should pass inet_address.addr().
With this, tokens in system.peers are updated correctly.
(1 rows)
cqlsh> SELECT tokens from system.peers;
tokens
------------------------------------------------------------------------
{'-5463187748725106974', '8051017138680641610', '8833112506891013468'}
(1 rows)
I got this error If I pass inet_address to it.
boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::bad_any_cast>
> (boost::bad_any_cast: failed conversion using boost::any_cast)
Assume we have 3 tokens,
{ee 36 d0 3e e8 6c 35 b1 , c5 5b 00 4a 1d 77 4e 50 , b9 b2 a1 0a 16 0d 76 8e }
With this
for (auto t : tokens) {
_token_metadata.update_normal_token(t, get_broadcast_address());
}
Only the last token is inserted.
With this
_token_metadata.update_normal_tokens(tokens, get_broadcast_address());
All 3 tokens are inserted correctly.
The reason is that the reader may think that these fields store
some statistics information about a sstable just loaded, but
they are only used when writing a new sstable.
Now I'm starting to see the value of having a sstable class for
a sstable loaded and another one for a sstable being created
(that's what Origin does).
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
From Pawel:
This series fixes SELECT DISTINC statements. Previously, we relied on the
existance of static row to get proper results. That obviously doesn't work
when there is no static row in the partition. The solution for that is
to introduce new option to partition_slice: distinct which informs that
the only important information is static row and whether the partition
exists.
(or rather, improve them in the future when they use make_local_reader)
Since shard data is now disjoint, read shards in order rather than
concurrently.
Instead of merging shard data using make_combined_reader(), take advantage
of the fact that shard data is disjoint, and use make_joining_reader().
This removes the need to sort the partitions as they are being read.