Linux 如何仅使用 grep/sed 提取子字符串和数字
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15371450/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to extract substring and numbers only using grep/sed
提问by Hooloovoo
I have a text file containing both text and numbers, I want to use grep to extract only the numbers I need for example, given a file as follow:
我有一个包含文本和数字的文本文件,我想使用 grep 仅提取我需要的数字,例如,给定文件如下:
miss rate 0.21
ipc 222
stalls n shdmem 112
So say I only want to extract the data for miss rate
which is 0.21
. How do I do it with grep or sed? Plus, I need more than one number, not only the one after miss rate
. That is, I may want to get both 0.21
and 112
. A sample output might look like this:
所以说我只想提取miss rate
is的数据0.21
。我如何用 grep 或 sed 做到这一点?另外,我需要多个数字,而不仅仅是miss rate
. 也就是说,我可能想要同时获得0.21
和112
。示例输出可能如下所示:
0.21 222 112
Cause I need the data for later plot.
因为我需要稍后绘图的数据。
采纳答案by that other guy
Use awk
instead:
使用awk
来代替:
awk '/^miss rate/ { print }' yourfile
To do it with just grep, you need non-standard extensions like here with GNU grep using PCRE (-P) with positive lookbehind (?<=..) and match only (-o):
要仅使用 grep 来完成此操作,您需要使用 PCRE (-P) 和正向后视 (?<=..) 并仅匹配 (-o) 的非标准扩展,例如此处与 GNU grep 一起使用:
grep -Po '(?<=miss rate ).*' yourfile
回答by kamituel
You can use:
您可以使用:
grep -P "miss rate \d+(\.\d+)?" file.txt
or:
或者:
grep -E "miss rate [0-9]+(\.[0-9]+)?"
Both of those commands will print out miss rate 0.21
. If you want to extract the number only, why not use Perl, Sed or Awk?
这两个命令都会打印出来miss rate 0.21
。如果您只想提取数字,为什么不使用 Perl、Sed 或 Awk?
If you really want to avoid those, maybe this will work?
如果你真的想避免这些,也许这会奏效?
grep -E "miss rate [0-9]+(\.[0-9]+)?" g | xargs basename | tail -n 1
回答by DanneJ
If you reallywant to use only grep for this, then you can try:
如果您真的只想为此使用 grep,那么您可以尝试:
grep "miss rate" file | grep -oe '\([0-9.]*\)'
It will first find the line that matches, and then only output the digits.
它将首先找到匹配的行,然后只输出数字。
Sed might be a bit more readable, though:
不过,Sed 可能更具可读性:
sed -n 's#miss rate ##p' file
回答by Gilles Quenot
回答by mariux
The grep
-and-cut
solution would look like:
的grep
-和-cut
的解决办法是这样的:
to get the 3rd field for every successful grep use:
为每次成功使用 grep 获取第三个字段:
grep "^miss rate " yourfile | cut -d ' ' -f 3
or to get the 3rd field and the rest use:
或获取第三个字段,其余使用:
grep "^miss rate " yourfile | cut -d ' ' -f 3-
Or if you use bash and "miss rate" only occurs once in your file you can also just do:
或者,如果您使用 bash 并且“未命中率”仅在您的文件中出现一次,您也可以这样做:
a=( $(grep -m 1 "miss rate" yourfile) )
echo ${a[2]}
where ${a[2]}
is your result.
${a[2]}
你的结果在哪里。
If "miss rate" occurs more then once you can loop over the grep output reading only what you need. (in bash)
如果“未命中率”发生的次数更多,则您可以循环遍历 grep 输出,仅读取您需要的内容。(在 bash 中)
回答by Daniel Williams
I believe
我相信
sed 's|[^0-9]*\([0-9\.]*\)|\1 |g' fiilename
sed 's|[^0-9]*\([0-9\.]*\)|\1 |g' fiilename
will do the trick. However every entry will be on it's own line if that is ok. I am sure there is a way for sed to produce a comma or space delimited list but I am not a super master of all things sed.
会做的伎俩。但是,如果可以的话,每个条目都将在它自己的行上。我确信 sed 有一种方法可以生成逗号或空格分隔的列表,但我不是 sed 的所有方面的超级大师。