Linux 如何使用grep删除单词前缀?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11673287/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 13:55:21  来源:igfitidea点击:

How to remove a word prefix using grep?

regexlinuxshellsed

提问by Jury A

How can I remove the beginning of a word using grep ? Ex: I have a file that contains:

如何使用 grep 删除单词的开头?例如:我有一个包含以下内容的文件:

www.abc.com

I only need the part

我只需要那部分

abc.com

Sorry for the basic question. But have no experience with Linux.

对不起,基本问题。但是对Linux没有经验。

采纳答案by sastanin

You don't edit strings with grepin Unix shell, grepis usually used to find or remove some lines from the text. You'd rather use sedinstead:

你不在grepUnix shell 中编辑字符串,grep通常用于从文本中查找或删除一些行。你宁愿使用sed

$ echo www.example.com | sed 's/^[^\.]\+\.//'
example.com

You'll need to learn regular expressions to use it effectively.

您需要学习正则表达式才能有效地使用它

Sed can also edit file in-place (modify the file), if you pass -iargument, but be careful, you can easily lose data if you write the wrong sedcommand and use -iflag.

sed 也可以就地编辑文件(修改文件),如果你传递-i参数,但要小心,如果你写了错误的sed命令和使用-i标志,你很容易丢失数据。

An example

一个例子

From your comments guess you have a TeX document, and your want to remove the first part of all .com domain names. If it is your document test.tex:

根据您的评论猜测您有一个 TeX 文档,并且您想删除所有 .com 域名的第一部分。如果它是您的文件test.tex

\documentclass{article}
\begin{document}
www.example.com
example.com www.another.domain.com
\end{document}

then you can transform it with this sedcommand (redirect output to file or edit in-place with -i):

然后您可以使用此sed命令对其进行转换(将输出重定向到文件或使用 就地编辑-i):

$ sed 's/\([a-z0-9-]\+\.\)\(\([a-z0-9-]\+\.\)\+com\)//gi' test.tex 
\documentclass{article}
\begin{document}
example.com
example.com another.domain.com
\end{document}

Please note that:

请注意:

  • A common sequence of allowed symbols followed by a dot is matched by [a-z0-9-]\+\.
  • I used groups in the regular expression (parts of it within \(and \)) to indicate the first and the second part of the URL, and I replace the entire match with its second group (\2in the substitution pattern)
  • The domain should be at least 3rd level .com domain (every \+repition means at least one match)
  • The search is case insensitive (iflag in the end)
  • It can do more than match per line (gflag in the end)
  • 一个常见的允许符号序列后跟一个点匹配 [a-z0-9-]\+\.
  • 我在正则表达式中使用组(它的一部分在\(和 中\))来指示 URL 的第一和第二部分,我用它的第二组替换整个匹配(\2在替换模式中)
  • 该域应至少为 3 级 .com 域(每个\+重复意味着至少一个匹配项)
  • 搜索不区分大小写(i最后是标志)
  • 它可以做的不仅仅是每行匹配(最后是g标志)

回答by Daniel DiPaolo

grepis not used to manipulate/change text, only to search for text/patterns within text

grep不用于操作/更改文本,仅用于在文本中搜索文本/模式

You should look into something like sedor awkor cutif you want a command line tool to do it. Or write a script in Python/Perl/Ruby/whatever.

你应该看看类似的东西sedawk或者cut如果你想要一个命令行工具来做到这一点。或者用 Python/Perl/Ruby/随便写一个脚本。

回答by Thor

As the others have noted, grepis not well suited for this task, sedis a good option, or if the text is well ordered a simple cutmight be easier to type:

正如其他人所指出的,grep不太适合此任务,sed是一个不错的选择,或者如果文本排序良好,则输入简单cut可能更容易:

echo www.abc.com | cut -d. -f2-
  • -d.tells cutto use .as a delimiter.
  • -f2-tells cutto return field 2 to infinity.
  • -d.告诉cut使用.作为分隔符。
  • -f2-告诉cut将字段 2 返回到无穷大。

回答by Igor Chubin

You can do this using grepeasily:

您可以grep轻松地做到这一点:

$ echo www.google.com | grep -o '[^.]*\.com'
google.com

Instead of echoyou must give your file.

而不是echo你必须给你的文件。

$ grep -o '[^.]*\.com$' < file

I used here the regular expression '[^.]*.com'. That means: find me a word without .in it ([^.]*), after which goes .com(\.comin re). The -okey says that grepmust show only that part that was found.

我在这里使用了正则表达式“[^.]*.com”。这意味着:给我找一个没有.在里面的词([^.]*),然后是.com\.com在 re)。该-o键说,grep必须只显示发现的一部分。

回答by Neoh

Although sed, awk, cutand even grepcan solve the problem, I think grepis not a good choice.

虽然sedawkcut甚至grep都可以解决问题,但我认为grep不是一个好的选择。

  • grepis a command-line utility for searching plain-text data sets for lines matching a regular expression.
  • But the utilities like sedand awtare exist for dealing with string line by line.
  • grep是一个命令行实用程序,用于搜索与正则表达式匹配的行的纯文本数据集。
  • 但是存在像sedawt这样的实用程序来逐行处理字符串。

回答by Fahd Ahmed

You can actually do this without invoking other programs, by using a builtin parameter expansion in bash:

通过在 bash 中使用内置参数扩展,您实际上可以在不调用其他程序的情况下执行此操作:

while read line; do echo ${line#*.}; done < file

Where #*.tells the shell to remove the prefix that looks like 0 or more characters followed by a ..

Where#*.告诉 shell 删除看起来像 0 个或多个字符后跟一个..

You can view a cheatsheet with the different parameter expansions for bash here:

您可以在此处查看具有 bash 不同参数扩展的备忘单:

https://devhints.io/bash

https://devhints.io/bash

回答by Matthias Braun

You can do this with a positive lookbehindand grep's --only-matchingflag:

您可以使用积极的后视和 grep 的--only-matching标志来做到这一点:

echo "www.abc.com" | grep --perl-regexp --only-matching '(?<=www\.).*'

which can be reduced to

可以减少到

echo "www.abc.com" | grep -Po '(?<=www\.).*'

Both produce

两者都产生

abc.com

美国广播公司

with grep (GNU grep) 3.3.

使用 grep (GNU grep) 3.3。