Linux 如何使用grep删除单词前缀？

Question

提问by Jury A

How can I remove the beginning of a word using grep ? Ex: I have a file that contains:

如何使用 grep 删除单词的开头？例如：我有一个包含以下内容的文件：

www.abc.com

I only need the part

我只需要那部分

abc.com

Sorry for the basic question. But have no experience with Linux.

对不起，基本问题。但是对Linux没有经验。

Answer 1

采纳答案by sastanin

You don't edit strings with grepin Unix shell, grepis usually used to find or remove some lines from the text. You'd rather use sedinstead:

你不在grepUnix shell 中编辑字符串，grep通常用于从文本中查找或删除一些行。你宁愿使用sed：

$ echo www.example.com | sed 's/^[^\.]\+\.//'
example.com

You'll need to learn regular expressions to use it effectively.

您需要学习正则表达式才能有效地使用它。

Sed can also edit file in-place (modify the file), if you pass -iargument, but be careful, you can easily lose data if you write the wrong sedcommand and use -iflag.

sed 也可以就地编辑文件（修改文件），如果你传递-i参数，但要小心，如果你写了错误的sed命令和使用-i标志，你很容易丢失数据。

An example

一个例子

From your comments guess you have a TeX document, and your want to remove the first part of all .com domain names. If it is your document test.tex:

根据您的评论猜测您有一个 TeX 文档，并且您想删除所有 .com 域名的第一部分。如果它是您的文件test.tex：

\documentclass{article}
\begin{document}
www.example.com
example.com www.another.domain.com
\end{document}

then you can transform it with this sedcommand (redirect output to file or edit in-place with -i):

然后您可以使用此sed命令对其进行转换（将输出重定向到文件或使用就地编辑-i）：

$ sed 's/\([a-z0-9-]\+\.\)\(\([a-z0-9-]\+\.\)\+com\)//gi' test.tex 
\documentclass{article}
\begin{document}
example.com
example.com another.domain.com
\end{document}

Please note that:

请注意：

A common sequence of allowed symbols followed by a dot is matched by [a-z0-9-]\+\.
I used groups in the regular expression (parts of it within \(and \)) to indicate the first and the second part of the URL, and I replace the entire match with its second group (\2in the substitution pattern)
The domain should be at least 3rd level .com domain (every \+repition means at least one match)
The search is case insensitive (iflag in the end)
It can do more than match per line (gflag in the end)

一个常见的允许符号序列后跟一个点匹配 [a-z0-9-]\+\.
我在正则表达式中使用组（它的一部分在\(和中\)）来指示 URL 的第一和第二部分，我用它的第二组替换整个匹配（\2在替换模式中）
该域应至少为 3 级 .com 域（每个\+重复意味着至少一个匹配项）
搜索不区分大小写（i最后是标志）
它可以做的不仅仅是每行匹配（最后是g标志）

Answer 2

回答by Daniel DiPaolo

grepis not used to manipulate/change text, only to search for text/patterns within text

grep不用于操作/更改文本，仅用于在文本中搜索文本/模式

You should look into something like sedor awkor cutif you want a command line tool to do it. Or write a script in Python/Perl/Ruby/whatever.

你应该看看类似的东西sed，awk或者cut如果你想要一个命令行工具来做到这一点。或者用 Python/Perl/Ruby/随便写一个脚本。

Answer 3

回答by Thor

As the others have noted, grepis not well suited for this task, sedis a good option, or if the text is well ordered a simple cutmight be easier to type:

正如其他人所指出的，grep不太适合此任务，sed是一个不错的选择，或者如果文本排序良好，则输入简单cut可能更容易：

echo www.abc.com | cut -d. -f2-

-d.tells cutto use .as a delimiter.
-f2-tells cutto return field 2 to infinity.

-d.告诉cut使用.作为分隔符。
-f2-告诉cut将字段 2 返回到无穷大。

Answer 4

回答by Igor Chubin

You can do this using grepeasily:

您可以grep轻松地做到这一点：

$ echo www.google.com | grep -o '[^.]*\.com'
google.com

Instead of echoyou must give your file.

而不是echo你必须给你的文件。

$ grep -o '[^.]*\.com$' < file

I used here the regular expression '[^.]*.com'. That means: find me a word without .in it ([^.]*), after which goes .com(\.comin re). The -okey says that grepmust show only that part that was found.

我在这里使用了正则表达式“[^.]*.com”。这意味着：给我找一个没有.在里面的词（[^.]*），然后是.com（\.com在 re）。该-o键说，grep必须只显示发现的一部分。

Answer 5

回答by Neoh

Although sed, awk, cutand even grepcan solve the problem, I think grepis not a good choice.

虽然sed、awk、cut甚至grep都可以解决问题，但我认为grep不是一个好的选择。

grepis a command-line utility for searching plain-text data sets for lines matching a regular expression.
But the utilities like sedand awtare exist for dealing with string line by line.

grep是一个命令行实用程序，用于搜索与正则表达式匹配的行的纯文本数据集。
但是存在像sed和awt这样的实用程序来逐行处理字符串。

Answer 6

回答by Fahd Ahmed

You can actually do this without invoking other programs, by using a builtin parameter expansion in bash:

通过在 bash 中使用内置参数扩展，您实际上可以在不调用其他程序的情况下执行此操作：

while read line; do echo ${line#*.}; done < file

Where #*.tells the shell to remove the prefix that looks like 0 or more characters followed by a ..

Where#*.告诉 shell 删除看起来像 0 个或多个字符后跟一个..

You can view a cheatsheet with the different parameter expansions for bash here:

您可以在此处查看具有 bash 不同参数扩展的备忘单：

https://devhints.io/bash

Answer 7

回答by Matthias Braun

You can do this with a positive lookbehindand grep's --only-matchingflag:

您可以使用积极的后视和 grep 的--only-matching标志来做到这一点：

echo "www.abc.com" | grep --perl-regexp --only-matching '(?<=www\.).*'

which can be reduced to

可以减少到

echo "www.abc.com" | grep -Po '(?<=www\.).*'

Both produce

两者都产生

abc.com

美国广播公司

with grep (GNU grep) 3.3.

使用 grep (GNU grep) 3.3。

Linux 如何使用grep删除单词前缀？

提问by Jury A

采纳答案by sastanin

An example

一个例子

回答by Daniel DiPaolo

回答by Thor

回答by Igor Chubin

回答by Neoh

回答by Fahd Ahmed

回答by Matthias Braun

相关推荐

最近更新

标签

Linux 如何使用grep删除单词前缀？

提问by Jury A

采纳答案by sastanin

An example

一个例子

回答by Daniel DiPaolo

回答by Thor

回答by Igor Chubin

回答by Neoh

回答by Fahd Ahmed

回答by Matthias Braun

相关推荐

如何杀死在 Linux 中的特定端口上运行的进程？

C# asp.net 中 DateTime 的 Javascript 序列化没有给出 javascript 日期对象？

如何在 linux 上的 php 中启用 --enable-soap？

Linux 使用 printf 的 %s 说明符打印 NULL 的行为是什么？

相关推荐

最近更新

标签