如何使用 Linux shell 脚本在文本文件中生成唯一行列表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16840910/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 23:05:12  来源:igfitidea点击:

How to generate list of unique lines in text file using a Linux shell script?

linuxuniquelines

提问by I Z

Suppose I have a file that contain a bunch of lines, some repeating:

假设我有一个包含一堆行的文件,其中一些是重复的:

line1
line1
line1
line2
line3
line3
line3

What linux command(s) should I use to generate a list of unique lines:

我应该使用什么 linux 命令来生成唯一行的列表:

line1
line2
line3

Does this change if the file is unsorted, i.e. repeating lines may not be in blocks?

如果文件未排序,这是否会改变,即重复行可能不在块中?

采纳答案by parkydr

If you don't mind the output being sorted, use

如果您不介意对输出进行排序,请使用

sort -u

This sorts and removes duplicates

这将排序并删除重复项

回答by go-oleg

catto output the contents, piped to sortto sort them, piped to uniqto print out the unique values:

cat输出内容,通过管道传输到sort对它们进行排序,通过管道传输uniq到打印出唯一值:

cat test1.txt | sort | uniq

cat test1.txt | sort | uniq

you don't need to do the sortpart if the file contents are already sorted.

sort如果文件内容已经排序,则不需要执行该部分。

回答by Kevin Sabbe

Create a new sort file with unique lines :

创建一个具有唯一行的新排序文件:

sort -u file >> unique_file

Create a new file with uniques lines (unsorted) :

创建一个带有唯一行(未排序)的新文件:

cat file | uniq >> unique_file

回答by simhumileco

If we do not care about the order, then the best solutionis actually:

如果我们不关心 order,那么最好的解决方案实际上是:

sort -u file

If we also want to ignore the case letter, we can use it (as a result all letters will be converted to uppercase):

如果我们还想忽略大小写字母,我们可以使用它(因此所有字母都将转换为大写):

sort -fu file

It would seem that even a better ideawould be to use the command:

似乎更好的主意是使用以下命令:

uniq file

and if we also want to ignore the case letter(as a result the first row of duplicates is returned, without any change in case):

如果我们还想忽略大小写字母(因此返回第一行重复项,大小写没有任何变化):

uniq -i file

However, in this case, may be returned a completely different result, than in case when we use thesortcommand,because uniqcommand does not detect repeated lines unless they are adjacent.

但是,在这种情况下,可能会返回与我们使用sort命令时完全不同的结果因为uniq命令不会检测重复行,除非它们相邻