mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-29 12:47:02 +00:00
In10dd08c9("messaging_service: supply and interpret rpc isolation_cookies", 4.2), we added a mechanism to perform rpc calls in remote scheduling groups based on the connection identity (rather than the verb), so that connection processing itself can run in the correct group (not just verb processing), and so that one verb can run in different groups according to need. In16d8cdadc("messaging_service: introduce the tenant concept", 4.2), we changed the way isolation cookies are sent: scheduling_group messaging_service::scheduling_group_for_verb(messaging_verb verb) const { return _scheduling_info_for_connection_index[get_rpc_client_idx(verb)].sched_group; @@ -665,11 +694,14 @@ shared_ptr<messaging_service::rpc_protocol_client_wrapper> messaging_service::ge if (must_compress) { opts.compressor_factory = &compressor_factory; } opts.tcp_nodelay = must_tcp_nodelay; opts.reuseaddr = true; - opts.isolation_cookie = _scheduling_info_for_connection_index[idx].isolation_cookie; + // We send cookies only for non-default statement tenant clients. + if (idx > 3) { + opts.isolation_cookie = _scheduling_info_for_connection_index[idx].isolation_cookie; + } This effectively disables the mechanism for the default tenant. As a result some verbs will be executed in whatever group the messaging service listener was started in. This used to be the main group, but in554ab03("main: Run init_server and join_cluster inside maintenance scheduling group", 4.5), this was change to the maintenance group. As a result normal read/writes now compete with maintenance operations, raising their latency significantly. Fix by sending the isolation cookie for all connections. With this, a 2-node cassandra-stress load has 99th percentile increase by just 3ms during repair, compared to 10ms+ before. Fixes #9505. Closes #10673 (cherry picked from commitc83393e819)