如何阅读、理解、分析和调试 Linux 内核崩溃?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13468286/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 17:50:13  来源:igfitidea点击:

How to read, understand, analyze, and debug a Linux kernel panic?

clinuxdebugginglinux-kernelpanic

提问by 0x90

Consider the following linux kernel dump stack trace, you can trigger a panic from the kernel source code by calling panic("debugging a linux kernel panic");:

考虑以下 linux 内核转储堆栈跟踪,您可以通过调用从内核源代码触发恐慌panic("debugging a linux kernel panic");

[<001360ac>] (unwind_backtrace+0x0/0xf8) from [<00147b7c>] (warn_slowpath_common+0x50/0x60)
[<00147b7c>] (warn_slowpath_common+0x50/0x60) from [<00147c40>] (warn_slowpath_null+0x1c/0x24)
[<00147c40>] (warn_slowpath_null+0x1c/0x24) from [<0014de44>] (local_bh_enable_ip+0xa0/0xac)
[<0014de44>] (local_bh_enable_ip+0xa0/0xac) from [<0019594c>] (bdi_register+0xec/0x150)
  • In unwind_backtrace+0x0/0xf8what the +0x0/0xf8stands for?
  • How can I see the C code of unwind_backtrace+0x0/0xf8?
  • How to interpret the panic's content?
  • unwind_backtrace+0x0/0xf8什么+0x0/0xf8主张?
  • 我怎样才能看到 的 C 代码unwind_backtrace+0x0/0xf8
  • 如何解读恐慌的内容?

采纳答案by iabdalkader

It's just an ordinary backtrace, those functions are called in reverse order (first one called was called by the previous one and so on):

这只是一个普通的回溯,这些函数以相反的顺序调用(第一个调用被前一个调用,依此类推):

unwind_backtrace+0x0/0xf8
warn_slowpath_common+0x50/0x60
warn_slowpath_null+0x1c/0x24
ocal_bh_enable_ip+0xa0/0xac
bdi_register+0xec/0x150

The bdi_register+0xec/0x150is the symbol + the offset/length there's more information about that in Understanding a Kernel Oopsand how you can debug a kernel oops. Also there's this excellent tutorial on Debugging the Kernel

bdi_register+0xec/0x150是符号+偏移/长度有关于更多的信息,了解内核哎呀,以及如何可以调试内核哎呀。还有这个关于调试内核的优秀教程

Note: as suggested below by Eugene, you may want to try addr2linefirst, it still needs an image with debugging symbols though, for example

注意:正如 Eugene 在下面建议的那样,您可能想先尝试addr2line,但它仍然需要一个带有调试符号的图像,例如

addr2line -e vmlinux_with_debug_info 0019594c(+offset)

addr2line -e vmlinux_with_debug_info 0019594c(+offset)

回答by 0x90

Here are 2 alternatives for addr2line. Assuming you have the proper target's toolchain you can do one of the following:

这里有 2 个替代方案addr2line。假设您拥有正确的目标工具链,您可以执行以下操作之一:

Use objdump:

使用objdump

  1. locate your vmlinuxor the .kofile under the kernel root directory, then disassemble the object file :

    objdump -dS vmlinux > /tmp/kernel.s
    
  2. Open the generated assembly file, /tmp/kernel.s. with a text editor such as vim. Go to unwind_backtrace+0x0/0xf8, i.e. search for the address of unwind_backtrace+ the offset. Finally, you have located the problematic part in your source code.

  1. 在内核根目录下找到您vmlinux或该.ko文件,然后反汇编目标文件:

    objdump -dS vmlinux > /tmp/kernel.s
    
  2. 打开生成的程序集文件/tmp/kernel.s. 使用文本编辑器,例如vim. 转到 unwind_backtrace+0x0/0xf8,即搜索unwind_backtrace+ 的地址offset。最后,您在源代码中找到了有问题的部分。

Use gdb:

使用gdb

IMO, an even more elegant option is to use the one and only gdb. Assuming you have the suitable toolchain on your host machine:

IMO,一个更优雅的选择是使用唯一的gdb. 假设您的主机上有合适的工具链:

  1. Run gdb <path-to-vmlinux>.
  2. Execute in gdb's prompt: list *(unwind_backtrace+0x10).
  1. 运行gdb <path-to-vmlinux>
  2. 在 gdb 的提示符下执行:list *(unwind_backtrace+0x10).

For additional information you may checkout the following:

有关其他信息,您可以查看以下内容:

  1. Kernel Debugging Tricks.
  2. Debugging The Linux Kernel Using Gdb
  1. 内核调试技巧
  2. 使用 Gdb 调试 Linux 内核

回答by mgalgs

In unwind_backtrace+0x0/0xf8what the +0x0/0xf8stands for?

unwind_backtrace+0x0/0xf8什么+0x0/0xf8主张?

The first number (+0x0) is the offset from the beginning of the function(unwind_backtracein this case). The second number (0xf8) is the total length of the function. Given these two pieces of information, if you already have a hunch about where the fault occurred this might be enough to confirm your suspicion (you can tell (roughly) how far along in the function you were).

第一个数字 ( +0x0) 是距函数开头偏移量unwind_backtrace在本例中)。第二个数字 ( 0xf8) 是函数总长度。鉴于这两条信息,如果您已经对故障发生的位置有预感,这可能足以证实您的怀疑(您可以(大致)知道您在功能上的进展情况)。

To get the exact source line of the corresponding instruction (generally better than hunches), use addr2lineor the other methods in other answers.

要获得相应指令的确切源代码行(通常比预感更好),请使用addr2line其他答案中的 或 其他方法。