Linux 寻找独特的线条

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13778273/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 17:59:49  来源:igfitidea点击:

Find unique lines

linuxsortinguniqueuniq

提问by amprantino

How can I find the unique lines and remove all duplicates from a file? My input file is

如何找到唯一的行并从文件中删除所有重复项?我的输入文件是

1
1
2
3
5
5
7
7

I would like the result to be:

我希望结果是:

2
3

sort file | uniqwill not do the job. Will show all values 1 time

sort file | uniq不会做这项工作。将显示所有值 1 次

回答by Shiplu Mokaddim

uniq -u < filewill do the job.

uniq -u < file会做的工作。

回答by Lev Levitsky

uniqhas the option you need:

uniq有您需要的选项:

   -u, --unique
          only print unique lines
$ cat file.txt
1
1
2
3
5
5
7
7
$ uniq -u file.txt
2
3

回答by kasavbere

Use as follows:

使用方法如下:

sort < filea | uniq > fileb

回答by amprantino

This was the first i tried

这是我第一次尝试

skilla:~# uniq -u all.sorted  

76679787
76679787 
76794979
76794979 
76869286
76869286 
......

After doing a cat -e all.sorted

做了一个 cat -e all.sorted 之后

skilla:~# cat -e all.sorted 
$
76679787$
76679787 $
76701427$
76701427$
76794979$
76794979 $
76869286$
76869286 $

Every second line has a trailing space :( After removing all trailing spaces it worked!

每一行都有一个尾随空格 :( 在删除所有尾随空格后它起作用了!

thank you

谢谢你

回答by ashmew2

uniq -u has been driving me crazy because it did not work.

uniq -u 一直让我发疯,因为它不起作用。

So instead of that, if you have python (most Linux distros and servers already have it):

因此,如果您有 python(大多数 Linux 发行版和服务器已经有了它),那么取而代之的是:

Assuming you have the data file in notUnique.txt

假设您在 notUnique.txt 中有数据文件

#Python
#Assuming file has data on different lines
#Otherwise fix split() accordingly.

uniqueData = []
fileData = open('notUnique.txt').read().split('\n')

for i in fileData:
  if i.strip()!='':
    uniqueData.append(i)

print uniqueData

###Another option (less keystrokes):
set(open('notUnique.txt').read().split('\n'))

Note that due to empty lines, the final set may contain '' or only-space strings. You can remove that later. Or just get away with copying from the terminal ;)

请注意,由于空行,最终集合可能包含 '' 或仅空格字符串。您可以稍后将其删除。或者只是逃避从终端复制;)

#

Just FYI, From the uniq Man page:

仅供参考,来自 uniq 手册页:

"Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use 'sort -u' without 'uniq'. Also, comparisons honor the rules specified by 'LC_COLLATE'."

“注意:'uniq' 不会检测重复的行,除非它们是相邻的。您可能希望先对输入进行排序,或者使用不带 'uniq' 的 'sort -u'。此外,比较遵循 'LC_COLLATE' 指定的规则。”

One of the correct ways, to invoke with: # sort nonUnique.txt | uniq

正确的方法之一,调用: # sort nonUnique.txt | 优衣库

Example run:

示例运行:

$ cat x
3
1
2
2
2
3
1
3

$ uniq x
3
1
2
3
1
3

$ uniq -u x
3
1
3
1
3

$ sort x | uniq
1
2
3

Spaces might be printed, so be prepared!

可能会打印空格,因此请做好准备!

回答by ashmew2

uniqshould do fine if you're file is/can be sorted, if you can't sort the file for some reason you can use awk:

uniq如果您对文件进行排序/可以对文件进行排序,则应该没问题,如果由于某种原因无法对文件进行排序,则可以使用awk

awk '{a[$0]++}END{for(i in a)if(a[i]<2)print i}'

awk '{a[$0]++}END{for(i in a)if(a[i]<2)print i}'

回答by a_rookie_seeking_answers

sort -d "file name" | uniq -u

this worked for me for a similar one. Use this if it is not arranged. You can remove sort if it is arranged

这对我来说适用于类似的。如果没有安排,请使用它。如果排列,您可以删除排序

回答by hychou

While sorttakes O(n log(n)) time, I prefer using

虽然sort需要 O(n log(n)) 时间,但我更喜欢使用

awk '!seen[
sort data.txt| uniq -u
]++'


awk '!seen[$0]++'is an abbreviation for awk '!seen[$0]++ {print}', print line(=$0) if seen[$0]is not zero. It take more space but only O(n) time.

awk '!seen[$0]++'是 , 的缩写awk '!seen[$0]++ {print}',如果seen[$0]不为零,则打印 line(=$0) 。它需要更多的空间,但只需要 O(n) 时间。

回答by skywardcode

You could also print out the unique value in "file" using the catcommand by piping to sortand uniq

您还可以打印出独特的价值在“文件”使用cat通过管道传输到指挥sortuniq

cat file | sort | uniq -u

cat file | sort | uniq -u

回答by blacker

you can use:

您可以使用:

##代码##

this sort data and filter by unique values

这种排序数据并按唯一值过滤