Forwarding lambda is reused, so we cannot move captures out of it
and we cannot pass references to them either since lambda can be
destroyed before send completes.
Works only if all replicas (participating in CL) has the same live
data. Does not detects mismatch in tombstones (no infrastructure yet).
Does not report timeout yet.
If local mutation write takes longer then write timeout mutation will
be deleted while it is processed by database engine. Fix this by storing
mutation in shared pointer and hold to the pointer until mutation is
locally processed.
This reverts commit 52aa0a3f91.
After c9909dd183 this is no longer needed since reference to a
handler is not used in abstract_write_response_handler::wait() continuation.
Conflicts:
service/storage_proxy.cc
Currently mutation clustering uses two timers, one expires when wait for
cl timeouts and is canceled when cl is achieved, another expires if some
endpoints do not answer for a long time (cl may be already achieved at
this point and first timer will be canceled). This is too complicated
especially since both timers can expire simultaneously. Simplify it by
having only one timer and checking in a callback whether cl was achieved.
Current timeout is 100ms. cassandra-stress is failing for me often
because of this, with "Mutation write timeout" message.
The comment says that the timeout value is based on
DatabaseDescriptor.getWriteRpcTimeout(), which in Origin is equal to 2
seconds by default, so bump it up.
Code pointers:
DatabaseDescriptor:L844
public static long getWriteRpcTimeout()
{
return conf.write_request_timeout_in_ms;
}
Config:L74
public volatile Long write_request_timeout_in_ms = 2000L;
If last response comes after write timeout is triggered, but before
continuation, that suppose to handle it runs the handler can be removed
to earlier and be access from the continuation after deletion. Fix it by
making response handler to be shared pointer instead of unique and
holding to it in timeout continuation.
Current model was not really correct because Origin doesn't support
querying of partition ranges by their value. We can query slices
according to dht::decorated_key ordering, which orders partitions
first by token then by key value.
ring_position encapsulates range constraint. Key value is optional, in
which case only token is constrained.
partitions_ranges will be manipulated upon to be split for different
destination, so provide it separately from read_command to not copy the
later for each destination.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
New in v2:
- storage_service: add a non-const version of get_token_metadata().
- get_broadcast_address(): check if net::get_messaging_service().local_is_initialized()
before calling net::get_local_messaging_service().listen_address().
- get_broadcast_address(): return an inet_address by value.
- system_keyspace: introduce db::system_keyspace::endpoint_dc_rack
- fb_utilities: use listen_address as broadcast_address for now
We want query_local() to actually respect the key we pass to it. Fixes
an issue in keyspace merging code where we returned multiple rows for a
keyspace.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
This reverts commit a19d2171eb.
This commit breaks cql_query_test.
[asias@hjpc urchin]$ ./cql_query_test
Running 1 test case...
WARNING: Not implemented: COMPACT_TABLES
WARNING: Not implemented: METRICS
WARNING: Not implemented: PERMISSIONS
cql_query_test: core/distributed.hh:290: Service&
distributed<Service>::local() [with Service =
service::storage_service]: Assertion `local_is_initialized()' failed.
unknown location(0): fatal error in "test_create_keyspace_statement":
signal: SIGABRT (application abort requested)
tests/test-utils.cc(31): last checkpoint
*** 1 failure detected in test suite "tests/urchin/cql_query_test.cc"
(gdb) bt
#0 0x00000032930348d7 in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:55
#1 0x000000329303653a in __GI_abort () at abort.c:89
#2 0x000000329302d47d in __assert_fail_base (fmt=0x3293186cb8
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x8ec10a "local_is_initialized()",
file=file@entry=0x92508d "core/distributed.hh",
line=line@entry=290, function=function@entry=0x8ed440
<distributed<service::storage_service>::local()::__PRETTY_FUNCTION__>
"Service& distributed<Service>::local() [with Service =
service::storage_service]")
at assert.c:92
#3 0x000000329302d532 in __GI___assert_fail (assertion=0x8ec10a
"local_is_initialized()", file=0x92508d "core/distributed.hh",
line=290,
function=0x8ed440
<distributed<service::storage_service>::local()::__PRETTY_FUNCTION__>
"Service& distributed<Service>::local() [with Service =
service::storage_service]") at assert.c:101
#4 0x0000000000430f19 in local (this=<optimized out>) at
core/distributed.hh:290
#5 get_local_storage_service () at service/storage_service.hh:3326
#6 keyspace::create_replication_strategy (this=0x7ffff6bf8350) at
database.cc:690
#7 0x000000000061537a in
_ZZZN2db20legacy_schema_tables15merge_keyspacesERN7service13storage_proxyEOSt3mapI13basic_sstringIcjLj15EE13lw_shared_ptrIN5query10result_setEESt4lessIS6_ESaISt4pairIKS6_SA_EEESI_ENKUlRT_E0_clISt6ve
ctorISF_SG_EEEDaSK_ENKUlR8databaseE_clESQ_ () at
db/legacy_schema_tables.cc:584
#8 0x0000000000617d19 in operator() (__closure=0x7ffff6bf8650) at
./core/distributed.hh:284
In the test, storage_service and other services are not stared.
Let's revert it and figure out a way to run cql_query_test with the
needed services started properly and then bring the "storage_service:
Remove ad-hoc token_metadata creation" change back.
_token_metadata is needed by replication strategy code on all cpus.
Changes to _token_metadata are done on cpu 0. Replicate it to all cpus.
We may copy only if _token_metadata actually changes. As a starter, we
always copy in gossip modification callbacks.
We do not care about the order of the tokens.
Also, in token_metadata, we use unordered_set for tokens as well, e.g.
update_normal_tokens. Unify the usage.