Linux 二进制文件中 ascii 字符串的“grep”偏移量
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14141008/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
"grep" offset of ascii string from binary file
提问by mgilson
I'm generating binary data files that are simply a series of records concatenated together. Each record consists of a (binary) header followed by binary data. Within the binary header is an ascii string 80 characters long. Somewhere along the way, my process of writing the files got a little messed up and I'm trying to debug this problem by inspecting how long each record actually is.
我生成的二进制数据文件只是一系列连接在一起的记录。每条记录由一个(二进制)标头和后跟二进制数据组成。在二进制头中是一个 80 个字符长的 ascii 字符串。在此过程中,我编写文件的过程有点混乱,我试图通过检查每条记录的实际长度来调试这个问题。
Thisseems extremely related, but I don't understand perl, so I haven't been able to get the accepted answer there to work. The other answer points to bgrep
which I've compiled, but it wants me to feed it a hex string and I'd rather just have a tool where I can give it the ascii string and it will find it in the binary data, print the string and the byte offset where it was found.
这似乎非常相关,但我不明白 perl,所以我无法在那里得到公认的答案。另一个答案指向bgrep
我已经编译的,但它希望我提供一个十六进制字符串,我宁愿有一个工具,我可以给它一个 ascii 字符串,它会在二进制数据中找到它,打印字符串和找到它的字节偏移量。
In other words, I'm looking for some tool which acts like this:
换句话说,我正在寻找一些像这样的工具:
tool foobar filename
or
或者
tool foobar < filename
and its output is something like this:
它的输出是这样的:
foobar:10
foobar:410
foobar:810
foobar:1210
...
e.g. the string which matched and a byte offset in the file where the match started. In this example case, I can infer that each record is 400 bytes long.
例如匹配的字符串和匹配开始的文件中的字节偏移量。在本例中,我可以推断出每条记录的长度为 400 字节。
Other constraints:
其他约束:
- ability to search by regex is cool, but I don't need it for this problem
- My binary files are big (3.5Gb), so I'd like to avoid reading the whole file into memory if possible.
- 通过正则表达式搜索的能力很酷,但我不需要它来解决这个问题
- 我的二进制文件很大(3.5Gb),所以如果可能的话,我想避免将整个文件读入内存。
采纳答案by Thor
You could use strings
for this:
您可以strings
为此使用:
strings -a -t x filename | grep foobar
Tested with GNU binutils.
用 GNU binutils 测试。
For example, where in /bin/ls
does --help
occur:
例如, where in/bin/ls
确实--help
发生:
strings -a -t x /bin/ls | grep -- --help
Output:
输出:
14938 Try `%s --help' for more information.
162f0 --help display this help and exit
回答by Hari Menon
grep --byte-offset --only-matching --text foobar filename
The --byte-offset
option prints the offset of each matching line.
该--byte-offset
选项打印每个匹配行的偏移量。
The --only-matching
option makes it print offset for each matching instance instead of each matching line.
该--only-matching
选项使其打印每个匹配实例的偏移量,而不是每个匹配行。
The --text
option makes grep treat the binary file as a text file.
该--text
选项使 grep 将二进制文件视为文本文件。
You can shorten it to:
您可以将其缩短为:
grep -oba foobar filename
It works in the GNU version of grep
, which comes with linux by default.?It won't work in BSD grep (which comes with Mac by default).
它适用于 GNU 版本grep
,它默认随 linux 一起提供。它不适用于 BSD grep(默认情况下随 Mac 一起提供)。
回答by caesun
I wanted to do the same task. Though strings | grep worked, I found gsar was the very tool I needed.
我想做同样的任务。虽然字符串 | grep 起作用了,我发现 gsar 正是我需要的工具。
The output looks like:
输出看起来像:
>gsar.exe -bic -sfoobar filename.bin
filename.bin: 0x34b5: AAA foobar BBB
filename.bin: 0x56a0: foobar DDD
filename.bin: 2 matches found