"As previously said, there were some unidentified bugs that prevented update_tokens from
working properly. The first one was sent alongside the series, the second one took me more
time, but it is fixed here."
At this point, users of the interface are futurized already, so we
just need to make sure they call the right function.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Soon that will involve a query. The idiom make_ready_future<>().then()
is a bit unusual to say the least, but it will soon be replace by an
actual future.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Most of the time we're querying internal, we are going for the system.
Defaulting to it just makes it easier, and callers can still change it
with set_keyspace if needed.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Announce schema mutations in a cluster via the DEFINITIONS_UPDATE verb
and pass them to merge_schema() at endpoints.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
We always operate on the local storage proxy so pass it by reference.
This simplifies DEFINITIONS_UPDATE message handler where all we have is
a "this" pointer to the local storage proxy.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Forwarding lambda is reused, so we cannot move captures out of it
and we cannot pass references to them either since lambda can be
destroyed before send completes.
Works only if all replicas (participating in CL) has the same live
data. Does not detects mismatch in tombstones (no infrastructure yet).
Does not report timeout yet.
If local mutation write takes longer then write timeout mutation will
be deleted while it is processed by database engine. Fix this by storing
mutation in shared pointer and hold to the pointer until mutation is
locally processed.
This reverts commit 52aa0a3f91.
After c9909dd183 this is no longer needed since reference to a
handler is not used in abstract_write_response_handler::wait() continuation.
Conflicts:
service/storage_proxy.cc
Currently mutation clustering uses two timers, one expires when wait for
cl timeouts and is canceled when cl is achieved, another expires if some
endpoints do not answer for a long time (cl may be already achieved at
this point and first timer will be canceled). This is too complicated
especially since both timers can expire simultaneously. Simplify it by
having only one timer and checking in a callback whether cl was achieved.
Current timeout is 100ms. cassandra-stress is failing for me often
because of this, with "Mutation write timeout" message.
The comment says that the timeout value is based on
DatabaseDescriptor.getWriteRpcTimeout(), which in Origin is equal to 2
seconds by default, so bump it up.
Code pointers:
DatabaseDescriptor:L844
public static long getWriteRpcTimeout()
{
return conf.write_request_timeout_in_ms;
}
Config:L74
public volatile Long write_request_timeout_in_ms = 2000L;
If last response comes after write timeout is triggered, but before
continuation, that suppose to handle it runs the handler can be removed
to earlier and be access from the continuation after deletion. Fix it by
making response handler to be shared pointer instead of unique and
holding to it in timeout continuation.
Current model was not really correct because Origin doesn't support
querying of partition ranges by their value. We can query slices
according to dht::decorated_key ordering, which orders partitions
first by token then by key value.
ring_position encapsulates range constraint. Key value is optional, in
which case only token is constrained.
partitions_ranges will be manipulated upon to be split for different
destination, so provide it separately from read_command to not copy the
later for each destination.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
New in v2:
- storage_service: add a non-const version of get_token_metadata().
- get_broadcast_address(): check if net::get_messaging_service().local_is_initialized()
before calling net::get_local_messaging_service().listen_address().
- get_broadcast_address(): return an inet_address by value.
- system_keyspace: introduce db::system_keyspace::endpoint_dc_rack
- fb_utilities: use listen_address as broadcast_address for now
We want query_local() to actually respect the key we pass to it. Fixes
an issue in keyspace merging code where we returned multiple rows for a
keyspace.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
This reverts commit a19d2171eb.
This commit breaks cql_query_test.
[asias@hjpc urchin]$ ./cql_query_test
Running 1 test case...
WARNING: Not implemented: COMPACT_TABLES
WARNING: Not implemented: METRICS
WARNING: Not implemented: PERMISSIONS
cql_query_test: core/distributed.hh:290: Service&
distributed<Service>::local() [with Service =
service::storage_service]: Assertion `local_is_initialized()' failed.
unknown location(0): fatal error in "test_create_keyspace_statement":
signal: SIGABRT (application abort requested)
tests/test-utils.cc(31): last checkpoint
*** 1 failure detected in test suite "tests/urchin/cql_query_test.cc"
(gdb) bt
#0 0x00000032930348d7 in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:55
#1 0x000000329303653a in __GI_abort () at abort.c:89
#2 0x000000329302d47d in __assert_fail_base (fmt=0x3293186cb8
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x8ec10a "local_is_initialized()",
file=file@entry=0x92508d "core/distributed.hh",
line=line@entry=290, function=function@entry=0x8ed440
<distributed<service::storage_service>::local()::__PRETTY_FUNCTION__>
"Service& distributed<Service>::local() [with Service =
service::storage_service]")
at assert.c:92
#3 0x000000329302d532 in __GI___assert_fail (assertion=0x8ec10a
"local_is_initialized()", file=0x92508d "core/distributed.hh",
line=290,
function=0x8ed440
<distributed<service::storage_service>::local()::__PRETTY_FUNCTION__>
"Service& distributed<Service>::local() [with Service =
service::storage_service]") at assert.c:101
#4 0x0000000000430f19 in local (this=<optimized out>) at
core/distributed.hh:290
#5 get_local_storage_service () at service/storage_service.hh:3326
#6 keyspace::create_replication_strategy (this=0x7ffff6bf8350) at
database.cc:690
#7 0x000000000061537a in
_ZZZN2db20legacy_schema_tables15merge_keyspacesERN7service13storage_proxyEOSt3mapI13basic_sstringIcjLj15EE13lw_shared_ptrIN5query10result_setEESt4lessIS6_ESaISt4pairIKS6_SA_EEESI_ENKUlRT_E0_clISt6ve
ctorISF_SG_EEEDaSK_ENKUlR8databaseE_clESQ_ () at
db/legacy_schema_tables.cc:584
#8 0x0000000000617d19 in operator() (__closure=0x7ffff6bf8650) at
./core/distributed.hh:284
In the test, storage_service and other services are not stared.
Let's revert it and figure out a way to run cql_query_test with the
needed services started properly and then bring the "storage_service:
Remove ad-hoc token_metadata creation" change back.