Linux 如何使用grep删除单词前缀?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11673287/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove a word prefix using grep?
提问by Jury A
How can I remove the beginning of a word using grep ? Ex: I have a file that contains:
如何使用 grep 删除单词的开头?例如:我有一个包含以下内容的文件:
www.abc.com
I only need the part
我只需要那部分
abc.com
Sorry for the basic question. But have no experience with Linux.
对不起,基本问题。但是对Linux没有经验。
采纳答案by sastanin
You don't edit strings with grep
in Unix shell, grep
is usually used to find or remove some lines from the text. You'd rather use sed
instead:
你不在grep
Unix shell 中编辑字符串,grep
通常用于从文本中查找或删除一些行。你宁愿使用sed
:
$ echo www.example.com | sed 's/^[^\.]\+\.//'
example.com
You'll need to learn regular expressions to use it effectively.
您需要学习正则表达式才能有效地使用它。
Sed can also edit file in-place (modify the file), if you pass -i
argument, but be careful, you can easily lose data if you write the wrong sed
command and use -i
flag.
sed 也可以就地编辑文件(修改文件),如果你传递-i
参数,但要小心,如果你写了错误的sed
命令和使用-i
标志,你很容易丢失数据。
An example
一个例子
From your comments guess you have a TeX document, and your want to remove the first part of all .com domain names. If it is your document test.tex
:
根据您的评论猜测您有一个 TeX 文档,并且您想删除所有 .com 域名的第一部分。如果它是您的文件test.tex
:
\documentclass{article}
\begin{document}
www.example.com
example.com www.another.domain.com
\end{document}
then you can transform it with this sed
command (redirect output to file or edit in-place with -i
):
然后您可以使用此sed
命令对其进行转换(将输出重定向到文件或使用 就地编辑-i
):
$ sed 's/\([a-z0-9-]\+\.\)\(\([a-z0-9-]\+\.\)\+com\)//gi' test.tex
\documentclass{article}
\begin{document}
example.com
example.com another.domain.com
\end{document}
Please note that:
请注意:
- A common sequence of allowed symbols followed by a dot is matched by
[a-z0-9-]\+\.
- I used groups in the regular expression (parts of it within
\(
and\)
) to indicate the first and the second part of the URL, and I replace the entire match with its second group (\2
in the substitution pattern) - The domain should be at least 3rd level .com domain (every
\+
repition means at least one match) - The search is case insensitive (
i
flag in the end) - It can do more than match per line (
g
flag in the end)
- 一个常见的允许符号序列后跟一个点匹配
[a-z0-9-]\+\.
- 我在正则表达式中使用组(它的一部分在
\(
和 中\)
)来指示 URL 的第一和第二部分,我用它的第二组替换整个匹配(\2
在替换模式中) - 该域应至少为 3 级 .com 域(每个
\+
重复意味着至少一个匹配项) - 搜索不区分大小写(
i
最后是标志) - 它可以做的不仅仅是每行匹配(最后是
g
标志)
回答by Daniel DiPaolo
grep
is not used to manipulate/change text, only to search for text/patterns within text
grep
不用于操作/更改文本,仅用于在文本中搜索文本/模式
You should look into something like sed
or awk
or cut
if you want a command line tool to do it. Or write a script in Python/Perl/Ruby/whatever.
你应该看看类似的东西sed
,awk
或者cut
如果你想要一个命令行工具来做到这一点。或者用 Python/Perl/Ruby/随便写一个脚本。
回答by Thor
As the others have noted, grep
is not well suited for this task, sed
is a good option, or if the text is well ordered a simple cut
might be easier to type:
正如其他人所指出的,grep
不太适合此任务,sed
是一个不错的选择,或者如果文本排序良好,则输入简单cut
可能更容易:
echo www.abc.com | cut -d. -f2-
-d.
tellscut
to use.
as a delimiter.-f2-
tellscut
to return field 2 to infinity.
-d.
告诉cut
使用.
作为分隔符。-f2-
告诉cut
将字段 2 返回到无穷大。
回答by Igor Chubin
You can do this using grep
easily:
您可以grep
轻松地做到这一点:
$ echo www.google.com | grep -o '[^.]*\.com'
google.com
Instead of echo
you must give your file.
而不是echo
你必须给你的文件。
$ grep -o '[^.]*\.com$' < file
I used here the regular expression '[^.]*.com'. That means: find me a word without .
in it ([^.]*
), after which goes .com
(\.com
in re). The -o
key says that grep
must show only that part that was found.
我在这里使用了正则表达式“[^.]*.com”。这意味着:给我找一个没有.
在里面的词([^.]*
),然后是.com
(\.com
在 re)。该-o
键说,grep
必须只显示发现的一部分。
回答by Neoh
Although sed, awk, cutand even grepcan solve the problem, I think grepis not a good choice.
虽然sed、awk、cut甚至grep都可以解决问题,但我认为grep不是一个好的选择。
- grepis a command-line utility for searching plain-text data sets for lines matching a regular expression.
- But the utilities like sedand awtare exist for dealing with string line by line.
- grep是一个命令行实用程序,用于搜索与正则表达式匹配的行的纯文本数据集。
- 但是存在像sed和awt这样的实用程序来逐行处理字符串。
回答by Fahd Ahmed
You can actually do this without invoking other programs, by using a builtin parameter expansion in bash:
通过在 bash 中使用内置参数扩展,您实际上可以在不调用其他程序的情况下执行此操作:
while read line; do echo ${line#*.}; done < file
Where #*.
tells the shell to remove the prefix that looks like 0 or more characters followed by a .
.
Where#*.
告诉 shell 删除看起来像 0 个或多个字符后跟一个.
.
You can view a cheatsheet with the different parameter expansions for bash here:
您可以在此处查看具有 bash 不同参数扩展的备忘单:
回答by Matthias Braun
You can do this with a positive lookbehindand grep's --only-matching
flag:
您可以使用积极的后视和 grep 的--only-matching
标志来做到这一点:
echo "www.abc.com" | grep --perl-regexp --only-matching '(?<=www\.).*'
which can be reduced to
可以减少到
echo "www.abc.com" | grep -Po '(?<=www\.).*'
Both produce
两者都产生
abc.com
美国广播公司
with grep (GNU grep) 3.3.
使用 grep (GNU grep) 3.3。