如何使用 Linux shell 脚本在文本文件中生成唯一行列表?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16840910/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to generate list of unique lines in text file using a Linux shell script?
提问by I Z
Suppose I have a file that contain a bunch of lines, some repeating:
假设我有一个包含一堆行的文件,其中一些是重复的:
line1
line1
line1
line2
line3
line3
line3
What linux command(s) should I use to generate a list of unique lines:
我应该使用什么 linux 命令来生成唯一行的列表:
line1
line2
line3
Does this change if the file is unsorted, i.e. repeating lines may not be in blocks?
如果文件未排序,这是否会改变,即重复行可能不在块中?
采纳答案by parkydr
If you don't mind the output being sorted, use
如果您不介意对输出进行排序,请使用
sort -u
This sorts and removes duplicates
这将排序并删除重复项
回答by go-oleg
cat
to output the contents, piped to sort
to sort them, piped to uniq
to print out the unique values:
cat
输出内容,通过管道传输到sort
对它们进行排序,通过管道传输uniq
到打印出唯一值:
cat test1.txt | sort | uniq
cat test1.txt | sort | uniq
you don't need to do the sort
part if the file contents are already sorted.
sort
如果文件内容已经排序,则不需要执行该部分。
回答by Kevin Sabbe
Create a new sort file with unique lines :
创建一个具有唯一行的新排序文件:
sort -u file >> unique_file
Create a new file with uniques lines (unsorted) :
创建一个带有唯一行(未排序)的新文件:
cat file | uniq >> unique_file
回答by simhumileco
If we do not care about the order, then the best solutionis actually:
如果我们不关心 order,那么最好的解决方案实际上是:
sort -u file
If we also want to ignore the case letter, we can use it (as a result all letters will be converted to uppercase):
如果我们还想忽略大小写字母,我们可以使用它(因此所有字母都将转换为大写):
sort -fu file
It would seem that even a better ideawould be to use the command:
似乎更好的主意是使用以下命令:
uniq file
and if we also want to ignore the case letter(as a result the first row of duplicates is returned, without any change in case):
如果我们还想忽略大小写字母(因此返回第一行重复项,大小写没有任何变化):
uniq -i file
However, in this case, may be returned a completely different result, than in case when we use thesort
command,because uniq
command does not detect repeated lines unless they are adjacent.
但是,在这种情况下,可能会返回与我们使用sort
命令时完全不同的结果,因为uniq
命令不会检测重复行,除非它们相邻。