Maintenance socket path used for PGO is in the node workdir.
When the node workdir path is too long, the maintenance socket path
(workdir/cql.m) can exceed the Unix domain socket sun_path limit
and failing the PGO training pipeline.
To prevent this:
- pass an explicit --maintenance-socket override
pointing to a short determinitic path in /tmp derived from the MD5
hash of the workdir maintenance socket path
- update maintenance_socket_path to return the matching short path
so that exec_cql.py connects to the right socket
The short path socket files are cleaned up after the cluster stops.
The path is using MD5 hash of the workdir path, so it is deterministic.
Fixes SCYLLADB-1070
Closesscylladb/scylladb#29149
The default 'cassandra' superuser was removed from ScyllaDB, which
broke PGO training. exec_cql.py relied on username/password auth
('cassandra'/'cassandra') to execute setup CQL scripts like auth.cql
and counters.cql.
Switch exec_cql.py to connect via the Unix domain maintenance socket
instead. The maintenance socket bypasses authentication, no credentials
are needed. Additionally, create the 'cassandra' superuser via the
maintenance socket during the populate phase, so that cassandra-stress
keeps working. cassandra-stress hardcodes user=cassandra password=cassandra.
Changes:
- exec_cql.py: replace host/port/username/password arguments with a
single --socket argument; add connect_maintenance_socket() with
wait ready logic
- pgo.py: add maintenance_socket_path() helper; update
populate_auth_conns() and populate_counters() to pass the socket
path to exec_cql.py
Fixes SCYLLADB-1070
Closesscylladb/scylladb#29081
It is considered a dangerous practice with possible unintended
side-effects, affecting later calls to the same function.
Found by CodeQL "Modification of parameter with default".
It was not enabled due to some cqlsh dependency missing.
After 3 years it's hard to say if the thing is fixed or not,
but anyway we don't need another big dependecy while we already
have python driver used exstensively in tests. We use simple
wrapper file exec_cql.py, shared with auth_conns workload to
conveniently read needed preparation statements from the file.
Additionally we switch tablets off as counters don't support
it yet.
It uses some derived roles and permissions
to exercise auth code paths and also creates new
connection with each stress request to exercise
also transport/server.cc connection handling code.
Since a1d7722 tablet keyspaces are not allowed to be repaired via the
old /storage_service/repair_async/{keyspace} API, instead the new
/storage_service/tablets/repair API has to be used. Adjust the repair
code and also add await_completion=true: the script just waits
for the repair to finish immediately after starting it.
Closesscylladb/scylladb#25455