Linux 寻找独特的线条

Question

提问by amprantino

How can I find the unique lines and remove all duplicates from a file? My input file is

如何找到唯一的行并从文件中删除所有重复项？我的输入文件是

I would like the result to be:

我希望结果是：

2
3

sort file | uniqwill not do the job. Will show all values 1 time

sort file | uniq不会做这项工作。将显示所有值 1 次

Answer 1

回答by Shiplu Mokaddim

uniq -u < filewill do the job.

uniq -u < file会做的工作。

Answer 2

回答by Lev Levitsky

uniqhas the option you need:

uniq有您需要的选项：

   -u, --unique
          only print unique lines

$ cat file.txt
1
1
2
3
5
5
7
7
$ uniq -u file.txt
2
3

Answer 3

回答by kasavbere

Use as follows:

使用方法如下：

sort < filea | uniq > fileb

Answer 4

回答by amprantino

This was the first i tried

这是我第一次尝试

skilla:~# uniq -u all.sorted  

76679787
76679787 
76794979
76794979 
76869286
76869286 
......

After doing a cat -e all.sorted

做了一个 cat -e all.sorted 之后

skilla:~# cat -e all.sorted 
$
76679787$
76679787 $
76701427$
76701427$
76794979$
76794979 $
76869286$
76869286 $

Every second line has a trailing space :( After removing all trailing spaces it worked!

每一行都有一个尾随空格 :( 在删除所有尾随空格后它起作用了！

thank you

谢谢你

Answer 5

回答by ashmew2

uniq -u has been driving me crazy because it did not work.

uniq -u 一直让我发疯，因为它不起作用。

So instead of that, if you have python (most Linux distros and servers already have it):

因此，如果您有 python（大多数 Linux 发行版和服务器已经有了它），那么取而代之的是：

Assuming you have the data file in notUnique.txt

假设您在 notUnique.txt 中有数据文件

#Python
#Assuming file has data on different lines
#Otherwise fix split() accordingly.

uniqueData = []
fileData = open('notUnique.txt').read().split('\n')

for i in fileData:
  if i.strip()!='':
    uniqueData.append(i)

print uniqueData

###Another option (less keystrokes):
set(open('notUnique.txt').read().split('\n'))

Note that due to empty lines, the final set may contain '' or only-space strings. You can remove that later. Or just get away with copying from the terminal ;)

请注意，由于空行，最终集合可能包含 '' 或仅空格字符串。您可以稍后将其删除。或者只是逃避从终端复制;)

#

Just FYI, From the uniq Man page:

仅供参考，来自 uniq 手册页：

"Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use 'sort -u' without 'uniq'. Also, comparisons honor the rules specified by 'LC_COLLATE'."

“注意：'uniq' 不会检测重复的行，除非它们是相邻的。您可能希望先对输入进行排序，或者使用不带 'uniq' 的 'sort -u'。此外，比较遵循 'LC_COLLATE' 指定的规则。”

One of the correct ways, to invoke with: # sort nonUnique.txt | uniq

正确的方法之一，调用： # sort nonUnique.txt | 优衣库

Example run:

示例运行：

$ cat x
3
1
2
2
2
3
1
3

$ uniq x
3
1
2
3
1
3

$ uniq -u x
3
1
3
1
3

$ sort x | uniq
1
2
3

Spaces might be printed, so be prepared!

可能会打印空格，因此请做好准备！

Answer 6

回答by ashmew2

uniqshould do fine if you're file is/can be sorted, if you can't sort the file for some reason you can use awk:

uniq如果您对文件进行排序/可以对文件进行排序，则应该没问题，如果由于某种原因无法对文件进行排序，则可以使用awk：

awk '{a[$0]++}END{for(i in a)if(a[i]<2)print i}'

Answer 7

回答by a_rookie_seeking_answers

sort -d "file name" | uniq -u

this worked for me for a similar one. Use this if it is not arranged. You can remove sort if it is arranged

这对我来说适用于类似的。如果没有安排，请使用它。如果排列，您可以删除排序

Answer 8

回答by hychou

While sorttakes O(n log(n)) time, I prefer using

虽然sort需要 O(n log(n)) 时间，但我更喜欢使用

awk '!seen[sort data.txt| uniq -u
]++'

awk '!seen[$0]++'is an abbreviation for awk '!seen[$0]++ {print}', print line(=$0) if seen[$0]is not zero. It take more space but only O(n) time.

awk '!seen[$0]++'是 , 的缩写awk '!seen[$0]++ {print}'，如果seen[$0]不为零，则打印 line(=$0) 。它需要更多的空间，但只需要 O(n) 时间。

Answer 9

回答by skywardcode

You could also print out the unique value in "file" using the catcommand by piping to sortand uniq

您还可以打印出独特的价值在“文件”使用cat通过管道传输到指挥sort和uniq

cat file | sort | uniq -u

Answer 10

回答by blacker

you can use:

您可以使用：

##代码##

this sort data and filter by unique values

这种排序数据并按唯一值过滤

Linux 寻找独特的线条

提问by amprantino

回答by Shiplu Mokaddim

回答by Lev Levitsky

回答by kasavbere

回答by amprantino

回答by ashmew2

Assuming you have the data file in notUnique.txt

假设您在 notUnique.txt 中有数据文件

Note that due to empty lines, the final set may contain '' or only-space strings. You can remove that later. Or just get away with copying from the terminal ;)

请注意，由于空行，最终集合可能包含 '' 或仅空格字符串。您可以稍后将其删除。或者只是逃避从终端复制;)

Example run:

示例运行：

Spaces might be printed, so be prepared!

可能会打印空格，因此请做好准备！

回答by ashmew2

回答by a_rookie_seeking_answers

回答by hychou

回答by skywardcode

回答by blacker

相关推荐

最近更新

标签

Linux 寻找独特的线条

提问by amprantino

回答by Shiplu Mokaddim

回答by Lev Levitsky

回答by kasavbere

回答by amprantino

回答by ashmew2

Assuming you have the data file in notUnique.txt

假设您在 notUnique.txt 中有数据文件

Note that due to empty lines, the final set may contain '' or only-space strings. You can remove that later. Or just get away with copying from the terminal ;)

请注意，由于空行，最终集合可能包含 '' 或仅空格字符串。您可以稍后将其删除。或者只是逃避从终端复制;)

Example run:

示例运行：

Spaces might be printed, so be prepared!

可能会打印空格，因此请做好准备！

回答by ashmew2

回答by a_rookie_seeking_answers

回答by hychou

回答by skywardcode

回答by blacker

相关推荐

Linux pthread_cond_wait 为 2 个线程

C# 从字符串中提取子字符串直到找到一个逗号

如何将当前正在运行的 linux 进程置于后台？

Linux 如何在播放器上播放 MPEG-DASH 的 mpd 中给出的 .m4s 文件？

相关推荐

最近更新

标签