Linux 如何仅使用 grep/sed 提取子字符串和数字

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15371450/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 22:26:46  来源:igfitidea点击:

how to extract substring and numbers only using grep/sed

regexlinuxbashsed

提问by Hooloovoo

I have a text file containing both text and numbers, I want to use grep to extract only the numbers I need for example, given a file as follow:

我有一个包含文本和数字的文本文件,我想使用 grep 仅提取我需要的数字,例如,给定文件如下:

miss rate 0.21  
ipc 222  
stalls n shdmem 112

So say I only want to extract the data for miss ratewhich is 0.21. How do I do it with grep or sed? Plus, I need more than one number, not only the one after miss rate. That is, I may want to get both 0.21and 112. A sample output might look like this:

所以说我只想提取miss rateis的数据0.21。我如何用 grep 或 sed 做到这一点?另外,我需要多个数字,而不仅仅是miss rate. 也就是说,我可能想要同时获得0.21112。示例输出可能如下所示:

0.21 222 112

Cause I need the data for later plot.

因为我需要稍后绘图的数据。

采纳答案by that other guy

Use awkinstead:

使用awk来代替:

awk '/^miss rate/ { print  }' yourfile

To do it with just grep, you need non-standard extensions like here with GNU grep using PCRE (-P) with positive lookbehind (?<=..) and match only (-o):

要仅使用 grep 来完成此操作,您需要使用 PCRE (-P) 和正向后视 (?<=..) 并仅匹配 (-o) 的非标准扩展,例如此处与 GNU grep 一起使用:

grep -Po '(?<=miss rate ).*' yourfile

回答by kamituel

You can use:

您可以使用:

grep -P "miss rate \d+(\.\d+)?" file.txt

or:

或者:

grep -E "miss rate [0-9]+(\.[0-9]+)?"

Both of those commands will print out miss rate 0.21. If you want to extract the number only, why not use Perl, Sed or Awk?

这两个命令都会打印出来miss rate 0.21。如果您只想提取数字,为什么不使用 Perl、Sed 或 Awk?

If you really want to avoid those, maybe this will work?

如果你真的想避免这些,也许这会奏效?

grep -E "miss rate [0-9]+(\.[0-9]+)?" g | xargs basename | tail -n 1

回答by DanneJ

If you reallywant to use only grep for this, then you can try:

如果您真的只想为此使用 grep,那么您可以尝试:

grep "miss rate" file | grep -oe '\([0-9.]*\)'

It will first find the line that matches, and then only output the digits.

它将首先找到匹配的行,然后只输出数字。

Sed might be a bit more readable, though:

不过,Sed 可能更具可读性:

sed -n 's#miss rate ##p' file

回答by Gilles Quenot

Using the special look aroundregex trick \Kwith pcreengine with grep:

使用特殊的周围看看正则表达式招PCRE发动机的grep

grep -oP 'miss rate \K.*' file.txt

or with perl:

或使用perl

perl -lne 'print $& if /miss rate \K.*/' file.txt

回答by mariux

The grep-and-cutsolution would look like:

grep-和-cut的解决办法是这样的:

to get the 3rd field for every successful grep use:

为每次成功使用 grep 获取第三个字段:

grep "^miss rate " yourfile | cut -d ' ' -f 3

or to get the 3rd field and the rest use:

或获取第三个字段,其余使用:

grep "^miss rate " yourfile | cut -d ' ' -f 3-

Or if you use bash and "miss rate" only occurs once in your file you can also just do:

或者,如果您使用 bash 并且“未命中率”仅在您的文件中出现一次,您也可以这样做:

a=( $(grep -m 1 "miss rate" yourfile) )
echo ${a[2]}

where ${a[2]}is your result.

${a[2]}你的结果在哪里。

If "miss rate" occurs more then once you can loop over the grep output reading only what you need. (in bash)

如果“未命中率”发生的次数更多,则您可以循环遍历 grep 输出,仅读取您需要的内容。(在 bash 中)

回答by Daniel Williams

I believe

我相信

sed 's|[^0-9]*\([0-9\.]*\)|\1 |g' fiilename

sed 's|[^0-9]*\([0-9\.]*\)|\1 |g' fiilename

will do the trick. However every entry will be on it's own line if that is ok. I am sure there is a way for sed to produce a comma or space delimited list but I am not a super master of all things sed.

会做的伎俩。但是,如果可以的话,每个条目都将在它自己的行上。我确信 sed 有一种方法可以生成逗号或空格分隔的列表,但我不是 sed 的所有方面的超级大师。