1031 lines
43 KiB
Markdown
1031 lines
43 KiB
Markdown
# Debugging with GDB
|
|
|
|
## Introduction
|
|
|
|
GDB is a source level debugger for C, C++ and more languages. It allows
|
|
inspecting the internal state of a program as it is running as well the
|
|
post-mortem inspection of crashed programs.
|
|
|
|
You can attach GDB to a running process, run a process inside GDB or
|
|
examine a coredump.
|
|
|
|
### Starting GDB
|
|
|
|
The two most common usages of GDB for scylla is running a process inside
|
|
it (e.g. a unit test):
|
|
|
|
gdb /path/to/executable
|
|
|
|
You can specify command-line arguments that gdb will forward to the
|
|
executable:
|
|
|
|
gdb /path/to/executable --args arg1 arg2 arg3
|
|
|
|
Another prevalent usage is to examine coredumps:
|
|
|
|
gdb --core=/path/to/coredump /path/to/executable
|
|
|
|
You can also attach it to an already running process:
|
|
|
|
gdb -p $pid
|
|
|
|
Where `$pid` is the PID of the running process you wish to attach GDB
|
|
to.
|
|
|
|
### Using GDB
|
|
|
|
GDB has excellent online documentation that you can find
|
|
[here](https://sourceware.org/gdb/onlinedocs/gdb/index.html).
|
|
|
|
Some of the more important topics:
|
|
* [Starting GDB](https://sourceware.org/gdb/onlinedocs/gdb/Invocation.html#Invocation)
|
|
* [Setting breakpoints](https://sourceware.org/gdb/onlinedocs/gdb/Set-Breaks.html#Set-Breaks)
|
|
* [Setting catchpoints](https://sourceware.org/gdb/onlinedocs/gdb/Set-Catchpoints.html#Set-Catchpoints)
|
|
* [Stepping through the code](https://sourceware.org/gdb/onlinedocs/gdb/Continuing-and-Stepping.html#Continuing-and-Stepping)
|
|
* [Examining the stack](https://sourceware.org/gdb/onlinedocs/gdb/Stack.html#Stack)
|
|
* [Examining data](https://sourceware.org/gdb/onlinedocs/gdb/Data.html#Data)
|
|
|
|
## Debugging Scylla with GDB
|
|
|
|
In general Scylla is quite hard to debug in GDB due to its asynchronous
|
|
nature. You will soon find that backtraces always lead to the reactor's
|
|
event loop and stepping through the code will not work as you expect as
|
|
soon as you leave or enter an asynchronous function.
|
|
That said GDB is an indispensable tool in debugging coredumps and when
|
|
used right can be of great help.
|
|
|
|
Over the years we have collected a set of tools for helping with debugging
|
|
scylla. These are collected in [``scylla-gdb.py``](https://github.com/scylladb/scylla/blob/master/scylla-gdb.py) and are in
|
|
the form of [commands](https://sourceware.org/gdb/onlinedocs/gdb/Commands.html#Commands),
|
|
[conveninence functions](https://sourceware.org/gdb/onlinedocs/gdb/Convenience-Funs.html#Convenience-Funs)
|
|
and [pretty printers](https://sourceware.org/gdb/onlinedocs/gdb/Pretty-Printing.html#Pretty-Printing).
|
|
To load the file issue the following command (inside gdb):
|
|
|
|
(gdb) source /path/to/scylla-gdb.py
|
|
|
|
You should be now ready to use all of the tools contained therein. To
|
|
list all available commands do:
|
|
|
|
(gdb) help scylla
|
|
|
|
To read the documentation of an individual command do:
|
|
|
|
(gdb) help scylla $commandname
|
|
|
|
Some commands have self explanatory names, some have documentation, and
|
|
some have neither :( (contributions are welcome).
|
|
|
|
To get the list of the available convenience functions do:
|
|
|
|
(gdb) help function
|
|
|
|
Note that this will list GDB internal functions as well as those added
|
|
by `scylla-gdb.py`.
|
|
Again, just like before, to see the documentation of an individual
|
|
function do:
|
|
|
|
(gdb) help function $functionname
|
|
|
|
### Tips and tricks
|
|
|
|
#### Tell GDB to not stop on signals used by seastar
|
|
|
|
When running scylla (or any seastar application for that matter) inside
|
|
GDB it will get interrupted often due to catching some signals used by
|
|
seastar internally. This makes debugging almost impossible. To avoid
|
|
this, instruct GDB to not stop on these signals:
|
|
|
|
(gdb) handle SIG34 SIG35 SIGUSR1 nostop noprint pass
|
|
|
|
#### Avoid (some) symbol parsing related crashes
|
|
|
|
GDB is known to crash when parsing some of scylla's symbols (especially
|
|
those related to futures). Usually telling it to not print static
|
|
members of classes and structs helps:
|
|
|
|
(gdb) set print static-members no
|
|
|
|
#### Enable extended python diagnostics
|
|
|
|
When using the facilities from `scylla-gdb.py` it is very useful to know
|
|
the full stack of a failure in some of the provided tools, so that you
|
|
can fix it or report it. To enable this run:
|
|
|
|
(gdb) set python print-stack full
|
|
|
|
#### Helping GDB find the source code for the executable
|
|
|
|
Often you find yourself debugging an executable, whose internal source
|
|
paths don't match those where they can be found on your machine. There
|
|
is an easy workaround for this:
|
|
|
|
(gdb) set substitute-path /path/to/src/in/executable /path/to/src/on/your/machine
|
|
|
|
Note that the pattern that you supply to `set substitute-path` just has
|
|
to be a common prefix of the paths. Example: if the source location
|
|
inside the executable to some file is `/opt/src/scylla/database.hh` and
|
|
on your machine it is `/home/joe/work/scylla/database.hh`, you can make
|
|
GDB find the sources on your machine via:
|
|
|
|
(gdb) set substitute-path /opt/src/scylla /home/joe/work/scylla
|
|
|
|
This method might not work if the sources do not have a prefix, e.g.
|
|
they are relative to the source tree root directory. In this case you can use the
|
|
`set directories` command to set the search path of sources for gdb:
|
|
|
|
(gdb) set directories /path/to/scylla/source/tree
|
|
|
|
Multiple directories can be listed, separated with `:`.
|
|
|
|
#### .gdbinit
|
|
|
|
GDB supports writing arbitrary GDB commands in any file and sourcing it.
|
|
One can use this to place commands that one would have to issue every
|
|
time when debugging in a file, instead of typing them each time GDB is
|
|
started.
|
|
Conventionally this file is called `.gdbinit` and GDB in fact will look
|
|
for it in you current directory, in your $HOME directory and some other
|
|
places. You can always load it by hand if GDB refuses or fails to load it:
|
|
|
|
(gdb) source /path/to/your/.gdbinit
|
|
|
|
Scylla provides a `gdbinit` file helpful for debugging scylla
|
|
at the root of the source tree. You can `source` it from your
|
|
local `.gdbinit` file if you wish.
|
|
|
|
#### TUI
|
|
|
|
GDB has a terminal based GUI called
|
|
[TUI](https://sourceware.org/gdb/onlinedocs/gdb/TUI.html#TUI).
|
|
This is extremely useful when you wish to see the source code while you
|
|
are debugging. The `TUI` mode can be activated by passing `-tui` to GDB
|
|
on the command line, or any time by executing the `tui enable` to
|
|
activate it and `tui disable` to deactivate it respectively.
|
|
By default the source window has the focus in TUI mode, meaning that command
|
|
completion, searching history and line editing doesn't work, e.g. if you use
|
|
the up and down keys, you will scroll the source file up and down respectively,
|
|
instead of moving in the command history. To focus the command window, issue
|
|
`focus cmd`. To move the focus to the source window again, issue `focus src`.
|
|
|
|
#### Thread Local Storage (TLS) variables
|
|
|
|
Thread local variables are saved in a special area of memory, at a negative
|
|
offset from `$fs_base`. Let's look at an example TLS variable, given the
|
|
following C++ code from seastar:
|
|
|
|
namespace seastar::internal {
|
|
|
|
inline
|
|
scheduling_group*
|
|
current_scheduling_group_ptr() noexcept {
|
|
// Slow unless constructor is constexpr
|
|
static thread_local scheduling_group sg;
|
|
return &sg;
|
|
}
|
|
|
|
}
|
|
|
|
Let's have a look in GDB:
|
|
|
|
(gdb) p &'seastar::internal::current_scheduling_group_ptr()::sg'
|
|
$1 = (<thread local variable, no debug info> *) 0x7fc1f11e7c0c
|
|
(gdb) p/x $fs_base
|
|
$2 = 0x7fc1f11ff700
|
|
(gdb) p/x 0x7fc1f11e7c0c - $fs_base
|
|
$3 = 0xfffffffffffe850c
|
|
(gdb) p/x -0xfffffffffffe850c
|
|
$4 = 0x17af4
|
|
|
|
The variable `sg` is located at offset `0x17af4` beneath `$fs_base`. We
|
|
can also calculate the offset (and hence address) of a known TLS
|
|
variable in memory as follows:
|
|
|
|
$fs_offset = $tls_entry - $sizeof_TLS_header
|
|
|
|
`$sizeof_TLS_header` can be obtained by listing the program headers of the binary:
|
|
|
|
$ eu-readelf -l ./a.out
|
|
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
|
|
[...]
|
|
TLS 0x31ead40 0x00000000033ecd40 0x00000000033ecd40 0x000058 0x017bf0 R 0x40
|
|
[...]
|
|
|
|
We are interested in the size of the TLS header, which is in the
|
|
`MemSiz` column and is `0x017bf0` in this example. The value of the
|
|
`$tls_entry` can be found in the process' symbol table:
|
|
|
|
$ eu-readelf -s ./a.out
|
|
|
|
Symbol table [ 5] '.dynsym' contains 1288 entries:
|
|
1 local symbol String table: [ 9] '.dynstr'
|
|
Num: Value Size Type Bind Vis Ndx Name
|
|
[...]
|
|
1282: 000000000000010c 4 TLS LOCAL HIDDEN 23 _ZZN7seastar8internal28current_scheduling_group_ptrEvE2sg
|
|
[...]
|
|
|
|
If we substitute these values in we can verify our theory:
|
|
|
|
(gdb) set $tls_entry = 0x000000000000010c
|
|
(gdb) set $sizeof_TLS_header = 0x017bf0
|
|
(gdb) p/x $tls_entry - $sizeof_TLS_header
|
|
$5 = 0xfffe851c
|
|
(gdb) p/x -($tls_entry - $sizeof_TLS_header)
|
|
$6 = 0x17ae4
|
|
|
|
We can also identify a TLS variable based on its address. We know the
|
|
value of `$sizeof_TLS_header` and we can easily calculate `$fs_offset`.
|
|
To identify the variable we need to calculate its `$tls_entry` based on
|
|
which we can find the matching entry in the symbol table. Remaining with
|
|
the above example of the address being `0x7fc1f11e7c0c`, we can
|
|
calculate this as:
|
|
|
|
$tls_entry = $sizeof_TLS_header + $fs_offset
|
|
|
|
Do note however that `$fs_offset` is negative so this is in effect a
|
|
substitution:
|
|
|
|
$tls_entry = 0x017bf0 - 0x17ae4
|
|
|
|
This yields `0x10c` which is exactly the value of the `Value` column in
|
|
the matching symbol table entry. This should work also if you don't have
|
|
the address to the start of the object. In this case you have to locate
|
|
the entry in the symbol table, whose value range includes the
|
|
calculated value. This can be made easier by sorting the symbol table by
|
|
the `Value` column.
|
|
|
|
#### Optimized-out variables
|
|
|
|
In release builds one will find that a significant portion of variables and
|
|
function parameters are optimized out. This is very annoying but often one can
|
|
find a way around to inspect the desired variables.
|
|
|
|
For non-local variables, there is a good chance that a few frames up one can
|
|
find another reference that wasn't optimized out. Or one can try to find another
|
|
object, which is not optimized out, and which is also known to hold a reference
|
|
to the variable.
|
|
|
|
If the variable is local, one can try to look at the registers (`info
|
|
registers`) and try to identify which one holds its value. This is relatively
|
|
easy for pointers to objects as heap pointers are easily identifiable (they
|
|
start with `0x6` and are composed of 12 digits, e.g.: `0x60f146e3fbc0`) and one
|
|
can check the size of the object they point to with `scylla ptr`. If the
|
|
pointed-to object is an instance of a class with virtual methods, the object can
|
|
be easily identified by looking at its vtable pointer (`x/1a $ptr`).
|
|
|
|
### Debugging coredumps
|
|
|
|
Up until release 3.0 we used to build and package Scylla separately for each
|
|
supported distribution. Starting with 3.1 we moved to relocatable binaries.
|
|
These are built with a common [frozen toolchain](https://github.com/scylladb/scylla/blob/master/tools/toolchain/README.md)
|
|
and packages are bundled with all dependencies. This means that post 3.1 there
|
|
is just one build across all supported distros and that the exact environment
|
|
the binaries were built with is available in the form of a Docker image. This
|
|
makes debugging cores generated from relocatable binaries much easier.
|
|
As of now, all releases except 2019.1 ship via relocatable packages, so in this
|
|
chapter we will focus on how to debug cores generated from relocatable binaries,
|
|
with a subsection later explaining how to debug cores generated by 2019.1
|
|
binaries.
|
|
|
|
#### open-coredump.sh
|
|
|
|
The most convenient way to open a coredump is [scripts/open-coredump.sh](https://github.com/scylladb/scylladb/blob/master/scripts/open-coredump.sh).
|
|
Just point it to a coredump and after some time you should get a shell inside the
|
|
appropriate dbuild container, with a suggested gdb invocation line to open the
|
|
coredump.
|
|
|
|
If you prefer to open the coredump manually or the script fails for you, continue
|
|
below.
|
|
|
|
#### Relocatable binaries
|
|
|
|
Cores produced by relocatable binaries can be simply opened in the
|
|
[dbuild](https://github.com/scylladb/scylla/blob/master/tools/toolchain/README.md) container they were built with. To do
|
|
that, two things (apart from the core itself of course) are needed:
|
|
1) The exact frozen toolchain (dbuild container).
|
|
2) The exact relocatable package the binary was part of.
|
|
|
|
##### Obtaining the frozen toolchain
|
|
|
|
The frozen toolchain is obtained based on the commit id of the version of the
|
|
scylla executable the core was produced with. The exact commit hash can be
|
|
obtained by running:
|
|
```
|
|
$ scylla --version
|
|
5.2.0~dev-0.20221210.e47794ed9832
|
|
````
|
|
The version can be divided into 4 parts:
|
|
* The version identifier, in this case: 5.2.0~dev; the ~dev suffix means this is
|
|
an uncut, development branch (master), for releases this suffix is missing.
|
|
* The build identifier, in this case: 0.
|
|
* The date, in this case: 20221210.
|
|
* The commit hash, in this case: e47794ed9832.
|
|
|
|
Based on the latter, you can obtain the right frozen toolchain:
|
|
|
|
$ cd /path/to/scylla
|
|
$ git checkout $commit_hash
|
|
|
|
##### Obtaining the relocatable-package
|
|
|
|
Once we have the right toolchain, we have to obtain the relocatable package.
|
|
This is obtained based on the build-id, which can be obtained from the coredump
|
|
like this:
|
|
|
|
$ eu-unstrip -n --core $corefile
|
|
|
|
Or from the executable like this:
|
|
|
|
$ eu-unstrip -n --exec $executable
|
|
|
|
You can find the relocatable using the
|
|
http://backtrace.scylladb.com/index.html search form
|
|
using either the scylla Build ID or the Release number (e.g. 5.0.0 or 2022.1)
|
|
to search the packages.
|
|
|
|
The form can also be used to decode backtraces generated
|
|
by the corresponding scylla binary.
|
|
|
|
**NOTE**: Use the normal relocatable package, usually called
|
|
`scylla-package.tar.gz`, not the debuginfo one usually called
|
|
`scylla-debug-package.tar.gz`.
|
|
|
|
Build-id:s for all official releases are listed on
|
|
http://backtrace.scylladb.com/releases.html.
|
|
|
|
##### Loading the core
|
|
|
|
Move the coredump and the unpackaged relocatable package into some dir
|
|
`$dir` on your system, then:
|
|
|
|
```
|
|
(host)$ cd /path/to/scylla # with the right commit checked out
|
|
(host)$ ./tools/toolchain/dbuild -it -v $dir:/workdir -- bash -l
|
|
(dbuild)$ cd /workdir
|
|
(dbuild)$ ln -s /path/to/unpackaged-relocatable-package /opt/scylladb # symlink the scylla subdir if you have the unified tarball
|
|
(dbuild)$ gdb --core=$corefile /opt/scylladb/libexec/scylla
|
|
```
|
|
|
|
You might need to add
|
|
|
|
-ex 'set auto-load safe-path /opt/scylladb/libreloc'
|
|
|
|
to the command line, see [No thread debugging](#no-thread-debugging).
|
|
|
|
### Troubleshooting
|
|
|
|
#### Namespace issues
|
|
|
|
GDB complaints that it can't find `namespace seastar` or some other Scylla
|
|
or Seastar symbol that you know exists. This usually happens when GDB is in
|
|
the wrong context i.e. a frame is selected which is not in the Scylla executable
|
|
but in some other library. A typical situation is opening a coredump and
|
|
attempting to access Scylla symbols when the initial frame is in libc.
|
|
Move up the stack, or select a frame which is a Scylla or Seastar function to
|
|
fix.
|
|
|
|
#### No thread debugging
|
|
|
|
Unable to access thread-local variables. Example:
|
|
|
|
(gdb) p seastar::local_engine
|
|
Cannot find thread-local storage for LWP 22604, executable file /usr/lib/debug/usr/bin/scylla.debug:
|
|
Cannot find thread-local variables on this target
|
|
|
|
The first step in finding out why thread debugging doesn't work is enabling
|
|
additional information about why thread debugging is not working:
|
|
|
|
(gdb) set debug libthread-db 1
|
|
|
|
This has to be done right after starting GDB, *before* the core and the
|
|
executable are loaded. You can do this by adding
|
|
`-iex "set debug libthread-db 1"` to your gdb command line.
|
|
|
|
The usual cause is that GDB failed to find some libraries or that the library
|
|
versions of those libraries GDB loaded don't match those the core was generated
|
|
with.
|
|
|
|
Of special note is the `libthread_db.so` library, which is crucial for
|
|
thread debugging to work. Common causes of failing to find or load this library
|
|
are discussed below.
|
|
|
|
##### Loading denied by auto-load safe-path
|
|
|
|
You might see a message like this:
|
|
|
|
warning: File "/opt/scylladb/libreloc/libthread_db.so.1" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load"
|
|
thread_db_load_search returning 0
|
|
|
|
To declare the directory this library is found at as safe to load from, do:
|
|
|
|
set auto-load safe-path /opt/scylladb/libreloc
|
|
|
|
Use the path that is appropriate for your setup. Alternatively you can use `/`
|
|
as the path to declare your entire file-system as safe to load stuff from.
|
|
Note that `libthread_db.so` is packaged together with `libc`. So if you have the
|
|
build-id appropriate `libc` package, you can be sure you have the correct
|
|
`libthread_db.so` too.
|
|
|
|
##### Missing debug symbols for glibc
|
|
|
|
If you see a message like this:
|
|
|
|
warning: Expected absolute pathname for libpthread in the inferior, but got .gnu_debugdata for /lib64/libpthread.so.0.
|
|
Trying host libthread_db library: /lib64/libthread_db.so.1.
|
|
td_ta_new failed: application not linked with libthread
|
|
thread_db_load_search returning 0
|
|
|
|
Installing debug symbols for glibc might solve this. This issue was seen on
|
|
Fedora 34, for more details see the
|
|
[bug report](https://bugzilla.redhat.com/show_bug.cgi?id=1960867).
|
|
Debug symbols can be installed with:
|
|
|
|
sudo dnf debuginfo-install glibc-2.32-6.fc33.x86_64
|
|
|
|
Adjust for the exact version you have installed.
|
|
|
|
##### Missing critical shared libraries
|
|
|
|
If you ensured `libthread_db.so` is present and is successfully loaded by GDB
|
|
but thread debugging still doesn't work, inspect the other libraries loaded by
|
|
GDB:
|
|
|
|
(gdb) info sharedlibrary
|
|
|
|
The listing will contain the path of the loaded libraries. If a library wasn't
|
|
found by GDB that will also be visible in the listing. You can then use the
|
|
`file` utility to obtain the build-id of the libraries:
|
|
|
|
file /path/to/libsomething.so
|
|
|
|
This build-id must match the one obtained from the core. The library build-ids
|
|
from the core can be obtained with:
|
|
|
|
eu-unstrip -n --core=/path/to/core
|
|
|
|
In general you can get away some non-core libraries missing or having the wrong
|
|
version, but the core libraries like `libc.so`, `libgcc_s.so`, `librt.so` and
|
|
`ld.so` (often called something like `ld-linux-x86-64.so.2`) etc. must have the
|
|
correct version. Best to ensure all libraries are correct to minimize the chance
|
|
of something not working. Also, make sure the build-id of the executable matches
|
|
that the core was generated with. Again, you can use `file` to obtain the
|
|
build-id of the executable, then compare it with the build-id obtained from the
|
|
`eu-unstrip` listing.
|
|
For more information on how to obtain the correct version of libraries and how
|
|
to override the path GDB loads them from, see [Collecting libraries](#collecting-libraries)
|
|
and [Opening the core on another OS](#opening-the-core-on-another-os).
|
|
|
|
##### Build IDs match but nothing works: can't backtrace, resolve symbols and no thread debugging
|
|
|
|
Make sure you are using the normal scylla package, not the debuginfo one, see
|
|
[Obtaining the relocatable package](#obtaining-the-relocatable-package).
|
|
|
|
#### GDB crashes when printing the backtrace or some variable
|
|
|
|
See [Avoid (some) symbol parsing related crashes](#avoid-some-symbol-parsing-related-crashes).
|
|
|
|
GDB has trouble with frames inlined into the outermost frame in a seastar thread,
|
|
or any green threads in general -- where the outermost frame is annotated with
|
|
`.cfi_undefined rip`. See
|
|
[GDB#26387](https://sourceware.org/bugzilla/show_bug.cgi?id=26387).
|
|
To work around this, pass a limit to `bt`, such that it excludes the problematic
|
|
frame. E.g. if `bt` prints 10 frames before GDB crashing, use `bt 9` to avoid the
|
|
crash.
|
|
|
|
#### GDB keeps stopping on some signals
|
|
|
|
See [Tell GDB to not stop on signals used by seastar](#tell-gdb-to-not-stop-on-signals-used-by-seastar).
|
|
|
|
### Debugging guides
|
|
|
|
Guides focusing on different aspects of debugging Scylla. These guides
|
|
assume a release build of Scylla.
|
|
|
|
#### The seastar memory allocator
|
|
|
|
Seastar has its own memory allocator optimized for Seastar's
|
|
thread-per-core architecture. Memory, just like CPU, is sharded among
|
|
threads, meaning that each shard has its equally-sized exclusive memory area.
|
|
|
|
The seastar allocator operates on three levels:
|
|
* pages (4KB)
|
|
* spans of pages (2^N pages, N ∈ [0, 31])
|
|
* objects managed by small pools
|
|
|
|
Small allocations (<= 16KB) are served by small memory pools. There is
|
|
exactly one pool per supported size, and there is a limited number of
|
|
sizes available between 1B (8B in practice) and 16KB. Allocations are served
|
|
by the pool with the closest, but larger than equal size to the requested
|
|
allocation, but alignments complicate this. Pools allocate spans
|
|
themselves for the space to allocate objects in.
|
|
|
|
Large allocations are served by allocating an entire span.
|
|
|
|
The allocator keeps a description of all the pages and pools in memory
|
|
in thread-local variables. Using these, it is possible to arrive at the
|
|
metadata describing any allocation with a few steps:
|
|
|
|
address -> page -> span -> pool?
|
|
|
|
This is exploited by the `scylla ptr` command which, given an address,
|
|
prints the following information:
|
|
* The allocation (small or large) this address is part of.
|
|
* The offset of the address from the beginning of the allocation.
|
|
* Is the object dead or live.
|
|
|
|
Example:
|
|
|
|
(gdb) scylla ptr 0x6000000f3830
|
|
thread 1, small (size <= 512), live (0x6000000f3800 +48)
|
|
|
|
It is possible to dump the state of the seastar allocator with the
|
|
`scylla memory` command. This prints a report containing the state of
|
|
all small pools as well as the availability of spans. It also prints
|
|
other Scylla specific information.
|
|
|
|
#### Continuations chains
|
|
|
|
Continuation chains are everywhere in Scylla. Every execution flow takes the
|
|
form of a continuation chain. This makes debugging very hard because the normal
|
|
GDB commands for inspecting and controlling execution flow (`backtrace`, `up`,
|
|
`down`, `step`, `next`, `return`, etc.) quickly fail to fulfill their purpose.
|
|
One will quickly find that every backtrace just leads to the same event loop in
|
|
the seastar reactor.
|
|
|
|
Continuation chains are formed by tasks. The tasks form an intrusive forward
|
|
linked list in memory. Each tasks links to the task that depends on it. Example:
|
|
|
|
future<> bar() {
|
|
return sleep(std::chrono::seconds(10)).then([] {
|
|
});
|
|
}
|
|
|
|
future<> foo() {
|
|
return bar().then([] {
|
|
});
|
|
}
|
|
|
|
When foo() is called a continuation chain of 3 tasks will be created:
|
|
* T0: sleep()
|
|
* T1: bar()::lambda#1
|
|
* T2: foo()::lambda#1
|
|
|
|
T1 depends on T0, and T2 depends on T1. In memory they form a forward linked
|
|
list like:
|
|
|
|
T0 -> T1 -> T2
|
|
|
|
The links are provided by the promise-future pairs in these tasks. Each task
|
|
contains a future half of one such pair and a promise half of another one. The
|
|
future is for the value arriving from the previous task and the promise is for
|
|
the value calculated in the local task, that the next task waits on.
|
|
|
|
The `task` object have the following interface:
|
|
|
|
class task {
|
|
scheduling_group _sg;
|
|
public:
|
|
virtual void run_and_dispose() noexcept = 0;
|
|
virtual task* waiting_task() noexcept = 0;
|
|
scheduling_group group() const { return _sg; }
|
|
shared_backtrace get_backtrace() const;
|
|
};
|
|
|
|
The only thing stored at a known offset is the scheduling group. Each task
|
|
object also has an associated action that is executed when the task runs
|
|
(`run_and_dispose()`), as well as pointer to the next task (returned by
|
|
`waiting_task()`). However task being a polymorphic object the layout of the
|
|
different kind of tasks is not known, so all we can say about them is that
|
|
somewhere they contain a promise object, which has a pointer to the future object
|
|
of the task that depends on them. Also note that continuations are just one kind
|
|
of task, there are other kinds of tasks as well. Many seastar primites,
|
|
like `do_with()`, `repeat()`, `do_until()`, etc. have their own task
|
|
types.
|
|
|
|
##### Traversing the continuation chain
|
|
|
|
Or in other words finding out what are the continuations waited on by this one as
|
|
well as the ones waiting on this one.
|
|
This involves searching for inbound and outbound references in the task and identifying
|
|
the one which is also a task. As this is quite a labour-intensive task, there is
|
|
a command in scylla-gdb.py which automates it, called `scylla fiber`. Example
|
|
usage:
|
|
|
|
(gdb) scylla fiber 0x0000600016217c80
|
|
#-1 (task*) 0x000060001a305910 0x0000000004aa5260 vtable for seastar::continuation<...> + 16
|
|
#0 (task*) 0x0000600016217c80 0x0000000004aa5288 vtable for seastar::continuation<...> + 16
|
|
#1 (task*) 0x000060000ac42940 0x0000000004aa2aa0 vtable for seastar::continuation<...> + 16
|
|
#2 (task*) 0x0000600023f59a50 0x0000000004ac1b30 vtable for seastar::continuation<...> + 16
|
|
|
|
This is somewhat similar to a backtrace, in that it shows tasks that are waited
|
|
on by this continuation and tasks that are waiting
|
|
for this continuation to finish, similar to how upstream functions are waiting
|
|
for the called function to finish before continuing their own execution.
|
|
See `help scylla fiber` and `scylla fiber --help` for more information on usage.
|
|
|
|
##### Seastar threads
|
|
|
|
Seastar threads are a special kind of continuation. Each seastar thread hosts a
|
|
stack but it can also be linked into a continuation chain. The stack of seastar
|
|
threads is a regular stack and all the normal stack related GDB commands can be
|
|
used in it. This can be used to inspect where exactly the seastar thread
|
|
stopped when it was suspended to wait on some future. Local variables can be
|
|
inspected too. The catch is how to make GDB context switch into the
|
|
stack of the seastar stack. Unfortunately there is no method that works
|
|
with GDB as of now, the `scylla thread` command crashes GDB and even if it
|
|
didn't, it'd only works in live processes.
|
|
To get this working a patched GDB is needed, see
|
|
https://github.com/denesb/seastar-gdb for instructions on how to use.
|
|
|
|
#### Debugging assert failures
|
|
|
|
Assert failures are the easiest (easiest but not easy) coredumps to debug
|
|
because we know the condition that failed, we know where and thus the
|
|
investigation has a clear scope -- to find out why. This is not always easy
|
|
though, especially if the root cause happened much earlier, and thus the state
|
|
the node was in at that time is not observable in the coredump. The root cause
|
|
might even be on another node altogether. In this case we try to gather as much
|
|
information as possible and write debug patches that hopefully catch the problem
|
|
earlier, and try to reproduce with them, hoping to get a new coredump that has
|
|
more information.
|
|
|
|
#### Debugging segmentation faults
|
|
|
|
Segmentation faults are usually caused by use-after-free,
|
|
use-after-move, dangling pointer/reference or memory corruption.
|
|
Unfortunately, coredumps often contain very little immediate information on what
|
|
exactly was wrong. It is rare to find something as obvious as a null pointer
|
|
trying to be dereferenced. So one has to dig a little to find out what
|
|
exactly triggered the SEGFAULT.
|
|
The most useful command in this is `scylla ptr`, as it allows
|
|
determining whether the address the current function is working with
|
|
belongs to a live object or not.
|
|
|
|
Once the immediate cause is found, only the "easy" part remains, finding
|
|
out how it happened.
|
|
In some cases this can be very difficult. For example in the case of a memory
|
|
corruption overwriting memory belonging to another object, the overwrite
|
|
could have happened much earlier, with no traces of what it was in the
|
|
coredump. In this case the same method has to be used that was mentioned
|
|
in the case of [debugging assert failures](#debugging-assert-failures):
|
|
adding additional debug code and trying to reproduce, hoping to catch
|
|
the perpetrator red-handed.
|
|
|
|
#### Debugging deadlocks
|
|
|
|
If the process that is stuck is known, start from there. Try to identify the
|
|
continuation-chain that is stuck, then
|
|
[follow it](#traversing-the-continuation-chain-backward)
|
|
to find the future that is blocking it all.
|
|
There is no way to differentiate a stuck continuation chain from one
|
|
that is making progress unfortunately, so there are not tried-and-proven
|
|
methods here either.
|
|
|
|
#### Debugging Out Of Memory (OOM) crashes
|
|
|
|
OOM crashes are usually the hardest to debug issues. Not only one has to
|
|
determine the immediate cause which is often already hard enough, as
|
|
usual one also has to determine what lead to this state, how did it happen.
|
|
|
|
That said, finding the immediate cause has a pretty standard procedure.
|
|
The first step is always issuing a `scylla memory` command and
|
|
determining where the memory is. Lets look at a concrete example:
|
|
|
|
(gdb) scylla memory
|
|
Used memory: 7452069888
|
|
Free memory: 20082688
|
|
Total memory: 7472152576
|
|
|
|
LSA:
|
|
allocated: 1067712512
|
|
used: 1065353216
|
|
free: 2359296
|
|
|
|
Cache:
|
|
total: 393216
|
|
used: 160704
|
|
free: 232512
|
|
|
|
Memtables:
|
|
total: 1067319296
|
|
Regular:
|
|
real dirty: 1064566784
|
|
unspooled: 811568656
|
|
System:
|
|
real dirty: 393216
|
|
unspooled: 393216
|
|
Streaming:
|
|
real dirty: 0
|
|
unspooled: 0
|
|
|
|
Coordinator:
|
|
bg write bytes: 42133 B
|
|
hints: 0 B
|
|
view hints: 0 B
|
|
00 "main"
|
|
fg writes: 0
|
|
bg writes: 0
|
|
fg reads: 0
|
|
bg reads: -7
|
|
05 "statement"
|
|
fg writes: 14
|
|
bg writes: 5
|
|
fg reads: 94
|
|
bg reads: 2352
|
|
|
|
Replica:
|
|
Read Concurrency Semaphores:
|
|
user sstable reads: 84/100, remaining mem: 138033377 B, queued: 0
|
|
streaming sstable reads: 0/ 10, remaining mem: 149443051 B, queued: 0
|
|
system sstable reads: 0/ 10, remaining mem: 149443051 B, queued: 0
|
|
Execution Stages:
|
|
data query stage:
|
|
Total 0
|
|
mutation query stage:
|
|
Total 0
|
|
apply stage:
|
|
02 "streaming" 287
|
|
Total 287
|
|
Tables - Ongoing Operations:
|
|
pending writes phaser (top 10):
|
|
12 cqlstress_lwt_example.blogposts
|
|
2 system.paxos
|
|
14 Total (all)
|
|
pending reads phaser (top 10):
|
|
1863 system.paxos
|
|
809 cqlstress_lwt_example.blogposts
|
|
2672 Total (all)
|
|
pending streams phaser (top 10):
|
|
0 Total (all)
|
|
|
|
Small pools:
|
|
objsz spansz usedobj memory unused wst%
|
|
1 4096 0 0 0 0.0
|
|
1 4096 0 0 0 0.0
|
|
1 4096 0 0 0 0.0
|
|
1 4096 0 0 0 0.0
|
|
2 4096 0 0 0 0.0
|
|
2 4096 0 0 0 0.0
|
|
3 4096 0 0 0 0.0
|
|
3 4096 0 0 0 0.0
|
|
4 4096 0 0 0 0.0
|
|
5 4096 0 0 0 0.0
|
|
6 4096 0 0 0 0.0
|
|
7 4096 0 0 0 0.0
|
|
8 4096 15285 126976 4696 3.7
|
|
10 4096 0 8192 8192 99.9
|
|
12 4096 173 8192 6116 74.6
|
|
14 4096 0 8192 8192 99.8
|
|
16 4096 11151 184320 5904 1.0
|
|
20 4096 3570 77824 6424 7.9
|
|
24 4096 19131 462848 3704 0.4
|
|
28 4096 2572 77824 5808 7.3
|
|
32 4096 27021 868352 3680 0.4
|
|
40 4096 14680 593920 6720 0.7
|
|
48 4096 3318 163840 4576 2.4
|
|
56 4096 12077 692224 15912 0.9
|
|
64 4096 52719 3375104 1088 0.0
|
|
80 4096 16382 1323008 12448 0.6
|
|
96 4096 17045 1667072 30752 0.3
|
|
112 4096 3402 397312 16288 2.5
|
|
128 4096 17767 2281472 7296 0.3
|
|
160 4096 17722 2912256 76736 0.3
|
|
192 4096 8094 1585152 31104 0.4
|
|
224 4096 17087 3891200 63712 0.1
|
|
256 4096 77945 21274624 1320704 0.1
|
|
320 8192 13232 4366336 132096 0.7
|
|
384 8192 5571 2203648 64384 1.0
|
|
448 4096 4290 1986560 64640 1.7
|
|
512 4096 2830 1503232 54272 2.8
|
|
640 12288 960 655360 40960 3.9
|
|
768 12288 5751 4489216 72448 0.1
|
|
896 8192 326 311296 19200 4.6
|
|
1024 4096 4320 5677056 1253376 0.7
|
|
1280 20480 251 425984 104704 22.2
|
|
1536 12288 3818 6373376 508928 1.7
|
|
1792 16384 2711 4980736 122624 0.9
|
|
2048 8192 594 1343488 126976 9.5
|
|
2560 20480 122 458752 146432 25.7
|
|
3072 12288 6596 21823488 1560576 0.9
|
|
3584 28672 6 294912 273408 91.1
|
|
4096 16384 2039 8372224 20480 0.2
|
|
5120 20480 7885 43220992 2849792 0.3
|
|
6144 24576 8188 54099968 3792896 0.8
|
|
7168 28672 30 622592 407552 53.0
|
|
8192 32768 8091 66781184 499712 0.7
|
|
10240 40960 15058 165216256 11022336 0.4
|
|
12288 49152 7034 92471296 6037504 0.3
|
|
14336 57344 6815 111935488 14235648 0.2
|
|
16384 65536 14046 230555648 425984 0.2
|
|
Small allocations: 872148992 [B]
|
|
Page spans:
|
|
index size [B] free [B] large [B] [spans]
|
|
0 4096 1888256 0 0
|
|
1 8192 663552 0 0
|
|
2 16384 0 0 0
|
|
3 32768 32768 2320334848 70811
|
|
4 65536 65536 3031105536 46251
|
|
5 131072 131072 1161822208 8864
|
|
6 262144 3145728 0 0
|
|
7 524288 524288 0 0
|
|
8 1048576 1048576 0 0
|
|
9 2097152 0 2097152 1
|
|
10 4194304 12582912 0 0
|
|
11 8388608 0 0 0
|
|
12 16777216 0 0 0
|
|
13 33554432 0 0 0
|
|
14 67108864 0 67108864 1
|
|
15 134217728 0 0 0
|
|
16 268435456 0 0 0
|
|
17 536870912 0 0 0
|
|
18 1073741824 0 0 0
|
|
19 2147483648 0 0 0
|
|
20 4294967296 0 0 0
|
|
21 8589934592 0 0 0
|
|
22 17179869184 0 0 0
|
|
23 34359738368 0 0 0
|
|
24 68719476736 0 0 0
|
|
25 137438953472 0 0 0
|
|
26 274877906944 0 0 0
|
|
27 549755813888 0 0 0
|
|
28 1099511627776 0 0 0
|
|
29 2199023255552 0 0 0
|
|
30 4398046511104 0 0 0
|
|
31 8796093022208 0 0 0
|
|
Large allocations: 6582468608 [B]
|
|
|
|
We can see a couple of things at glance here: free memory is very low,
|
|
cache is fully evicted. These are sure signs of a real OOM. Note that
|
|
free memory doesn't have to be 0 in the case of an OOM. It is enough for
|
|
a size pool to not be able to allocate more memory spans and thus fail
|
|
a critical allocations we cannot recover from. Also cache is fully
|
|
evicted doesn't mean it has 0 memory, but when it has just a couple of
|
|
KB, it is considered fully evicted. Cache is evicted by the seastar memory
|
|
allocator's memory reclamation mechanism, which is hooked up with the
|
|
cache and will start trying to free up memory by evicting the cache,
|
|
once memory runs low.
|
|
The cause of the OOM in this case is too many reads (1863) on
|
|
`system.paxos`. This can be seen in the replica section of the report.
|
|
The Coordinator and replica sections contain high level stats of the
|
|
state of the coordinator and replica respectively. These stats summarize
|
|
the usual suspects. Sometimes just looking at these is enough to
|
|
determine what is the cause of the OOM. If not, one has to look at the
|
|
last section: the dump of the state of the small pools and the page
|
|
spans. What we are looking for is a small pool or a span size that owns
|
|
excessive amounts of memory. Once found (there can be more than one) the
|
|
next task is to identify what the objects owning that memory are. Note
|
|
that in the case of smaller allocations, the memory is usually occupied
|
|
directly by some C++ object, why in the case of larger allocations, these are
|
|
usually potentially fragmented buffers, owned by some other object.
|
|
|
|
If the `scylla memory` output alone is not enough to explain what
|
|
exactly is eating up all the memory, there are some further usual
|
|
suspects that should be examined.
|
|
|
|
##### Exploded task- and smp-queues and lots of objects
|
|
|
|
Look for exploded task- and smp-queues with:
|
|
|
|
scylla task-queues # lists all task queues on the local shard
|
|
scylla smp-queues # lists smp queues
|
|
|
|
Look for lots of objects with:
|
|
|
|
scylla task_histogram -a # histogram of all objects with a vtable
|
|
|
|
A huge number in any of these reports can indicate problems of exploding
|
|
concurrency, or a shard not being to keep up. This can easily lead to work
|
|
accumulating, in the form of tasks and associated objects, to the point of OOM.
|
|
|
|
##### Expensive reads
|
|
|
|
Another usual suspect is the number of sstables. This can be queried via
|
|
`scylla sstables`. A large number of sstables for a single table (in the
|
|
hundreds or more) can cause an otherwise non-problematic amount of reads to use
|
|
excessive amount of memory, potentially leading to OOM.
|
|
|
|
Reversed- and unpaged-reads (or both, combined) can also consume a huge amount
|
|
of memory, to the point of a few of such reads causing OOM. The way to find
|
|
these is to inspect readers in memory, trying to locate their partition slice
|
|
and having a look at their respective options:
|
|
* `partition_slice::option::reversed` is set for a reversed query
|
|
* `partition_slice::option::allow_short_read` is cleared for an unpaged
|
|
query
|
|
|
|
Note that scylla have protections against reverse queries since 4.0, and against
|
|
unpaged queries since 4.3.
|
|
|
|
##### Other reasons
|
|
|
|
If none of the usual suspects are present then all bets are off and one has to
|
|
try to identify who the objects of the exploded size-class or span-size belong
|
|
to. Unfortunately there are no proven methods here: some try to inspect
|
|
the memory patterns and try to make sense of it, some try to build an
|
|
object graph out of these objects and make sense of that. For the
|
|
latter, the following commands might be of help:
|
|
|
|
scylla small-objects # lists objects from a small pool
|
|
scylla generate-object-graph # generates and visualizes an object graph
|
|
|
|
Good luck, you are off the charted path here.
|
|
|
|
#### Use python for inspecting objects and performing computations
|
|
GDB is well integrated with Python, and besides using the predefined set of
|
|
commands from `scylla-gdb.py`, you can also run an interactive prompt by typing
|
|
`python-interactive` or `pi` in gdb console. That allows you to benefit from
|
|
the rich set of helper classes and wrappers defined in `scylla-gdb.py`, e.g. in
|
|
order to iterate over one of the supported collection types (vectors, small
|
|
vectors, maps, tuples, etc.) and print only the interesting bits.
|
|
|
|
Example:
|
|
```
|
|
(gdb) source /path/to/scylla-gdb.py
|
|
(gdb) python-interactive
|
|
>>> db = gdb.parse_and_eval("*(seastar::sharded<replica::database>*)debug::the_database")
|
|
>>> instances = std_vector(db["_instances"])
|
|
>>> for i, instance in enumerate(instances):
|
|
... print(i, instance["service"])
|
|
...
|
|
0 {
|
|
_b = 0x60000403e000,
|
|
_p = 0x60000403e010
|
|
}
|
|
1 {
|
|
_b = 0x60100435e000,
|
|
_p = 0x60100435e010
|
|
}
|
|
```
|
|
|
|
You can also easily evaluate single expressions by using the `python` (or `py`) command:
|
|
```
|
|
(gdb) py print("\n".join([f" I/O queue {key}: {value}" for key, value in std_unordered_map(next(reactors())["_io_queues"])]))
|
|
I/O queue 0: std::unique_ptr<seastar::io_queue> = {
|
|
get() = 0x60100012ae00
|
|
}
|
|
```
|
|
|
|
In order to define and use variables between gdb and the Python interface,
|
|
one can use `gdb.convenience_variable()` and `gdb.set_convenience_variable()`
|
|
helpers. That can be clearer than relying on `gdb.parse_and_eval()` for
|
|
accessing objects.
|
|
|
|
Example:
|
|
```
|
|
(gdb) p debug::the_database
|
|
$1 = (seastar::sharded<replica::database> *) 0x7fffffffbdc0
|
|
(gdb) set $db=debug::the_database
|
|
(gdb) pi
|
|
>>> local_db = sharded(gdb.convenience_variable('db')).local()
|
|
>>> gdb.set_convenience_variable('local_db', local_db)
|
|
>>>
|
|
(gdb) p $local_db
|
|
$2 = (replica::database *) 0x60000575e010
|
|
```
|
|
|
|
#### Getting Linux process info
|
|
|
|
Some information is often placed in the notes of the ELF file. Those can be read
|
|
with the help of `eu-readelf --notes $core`. The information includes
|
|
|
|
Process IDs
|
|
```
|
|
pid: 42359, ppid: 28633, pgrp: 42359, sid: 42359
|
|
uid: 1000, gid: 1000, pid: 42359, ppid: 28633, pgrp: 42359, sid: 42359
|
|
```
|
|
|
|
CLI parameters including name and arguments
|
|
```
|
|
fname: scylla
|
|
psargs: /home/xemul/src/scylla/build/dev/scylla --smp 2 -m 1G --collectd 0 --overprovis
|
|
```
|
|
|
|
Signals information
|
|
```
|
|
info.si_signo: 6, info.si_code: 0, info.si_errno: 0, cursig: 6
|
|
sigpend: <>
|
|
sighold: ~<1-4,6,8-9,11,14-15,18-22,32-33,35>
|
|
```
|
|
|
|
Runtime process information like times and states
|
|
```
|
|
utime: 1.207371, stime: 1.736005, cutime: 0.000000, cstime: 0.000000
|
|
state: 0, sname: R, zomb: 0, nice: 0, flag: 0x0000000000400600
|
|
```
|
|
|
|
File mappings
|
|
```
|
|
238 files:
|
|
7ff5815e2000-7ff5815f2000 00000000 65536 /[aio] (deleted)
|
|
7ff5815f2000-7ff581603000 00000000 69632 /[aio] (deleted)
|
|
7ff581603000-7ff581613000 00000000 65536 /[aio] (deleted)
|
|
7ff581613000-7ff58163b000 00000000 163840 /usr/lib64/libc.so.6
|
|
7ff58163b000-7ff5817a4000 00028000 1478656 /usr/lib64/libc.so.6
|
|
7ff5817a4000-7ff5817f2000 00191000 319488 /usr/lib64/libc.so.6
|
|
7ff5817f2000-7ff5817f6000 001de000 16384 /usr/lib64/libc.so.6
|
|
7ff5817f6000-7ff5817f8000 001e2000 8192 /usr/lib64/libc.so.6
|
|
7ff581800000-7ff58189d000 00000000 643072 /usr/lib64/libstdc++.so.6.0.33
|
|
7ff58189d000-7ff5819d5000 0009d000 1277952 /usr/lib64/libstdc++.so.6.0.33
|
|
7ff5819d5000-7ff581a51000 001d5000 507904 /usr/lib64/libstdc++.so.6.0.33
|
|
7ff581a51000-7ff581a5e000 00251000 53248 /usr/lib64/libstdc++.so.6.0.33
|
|
7ff581a5e000-7ff581a5f000 0025e000 4096 /usr/lib64/libstdc++.so.6.0.33
|
|
```
|