比较linux终端中的两个文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14500787/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 18:47:43  来源:igfitidea点击:

Comparing two files in linux terminal

linuxterminaldifffile-comparison

提问by Ali Imran

There are two files called "a.txt"and "b.txt"both have a list of words. Now I want to check which words are extra in "a.txt"and are not in "b.txt".

有两个名为“a.txt”“b.txt”的文件都有一个单词列表。现在我想检查哪些单词在"a.txt" 中是多余的,而不是在"b.txt" 中

I need a efficient algorithm as I need to compare two dictionaries.

我需要一个有效的算法,因为我需要比较两个字典。

采纳答案by Ali Imran

Here is my solution for this :

这是我的解决方案:

mkdir temp
mkdir results
cp /usr/share/dict/american-english ~/temp/american-english-dictionary
cp /usr/share/dict/british-english ~/temp/british-english-dictionary
cat ~/temp/american-english-dictionary | wc -l > ~/results/count-american-english-dictionary
cat ~/temp/british-english-dictionary | wc -l > ~/results/count-british-english-dictionary
grep -Fxf ~/temp/american-english-dictionary ~/temp/british-english-dictionary > ~/results/common-english
grep -Fxvf ~/results/common-english ~/temp/american-english-dictionary > ~/results/unique-american-english
grep -Fxvf ~/results/common-english ~/temp/british-english-dictionary > ~/results/unique-british-english

回答by Anders Johansson

Sort them and use comm:

对它们进行排序并使用comm

comm -23 <(sort a.txt) <(sort b.txt)

commcompares (sorted) input files and by default outputs three columns: lines that are unique to a, lines that are unique to b, and lines that are present in both. By specifying -1, -2and/or -3you can suppress the corresponding output. Therefore comm -23 a blists only the entries that are unique to a. I use the <(...)syntax to sort the files on the fly, if they are already sorted you don't need this.

comm比较(排序)输入文件并默认输出三列:a 独有的行、b 独有的行以及两者中都存在的行。通过指定-1,-2和/或-3您可以抑制相应的输出。因此comm -23 a b只列出了 a 唯一的条目。我使用<(...)语法对文件进行动态排序,如果它们已经排序,则不需要它。

回答by Manjula

You can use difftool in linux to compare two files. You can use --changed-group-formatand --unchanged-group-formatoptions to filter required data.

您可以使用difflinux 中的工具来比较两个文件。您可以使用--changed-group-format--unchanged-group-format选项来过滤所需的数据。

Following three options can use to select the relevant group for each option:

以下三个选项可用于为每个选项选择相关组:

  • '%<' get lines from FILE1

  • '%>' get lines from FILE2

  • '' (empty string) for removing lines from both files.

  • '%<' 从 FILE1 中获取行

  • '%>' 从 FILE2 中获取行

  • ''(空字符串)用于从两个文件中删除行。

E.g: diff --changed-group-format="%<" --unchanged-group-format="" file1.txt file2.txt

例如:diff --changed-group-format="%<" --unchanged-group-format="" file1.txt file2.txt

[root@vmoracle11 tmp]# cat file1.txt 
test one
test two
test three
test four
test eight
[root@vmoracle11 tmp]# cat file2.txt 
test one
test three
test nine
[root@vmoracle11 tmp]# diff --changed-group-format='%<' --unchanged-group-format='' file1.txt file2.txt 
test two
test four
test eight

回答by Chris Seymour

Use comm -13(requires sorted files):

使用comm -13(需要排序文件)

$ cat file1
one
two
three

$ cat file2
one
two
three
four

$ comm -13 <(sort file1) <(sort file2)
four

回答by Fengya Li

if you have vim installed,try this:

如果你安装了 vim,试试这个:

vimdiff file1 file2

or

或者

vim -d file1 file2

you will find it fantastic.enter image description here

你会发现它很棒。在此处输入图片说明

回答by mudrii

Try sdiff(man sdiff)

试试sdiff( man sdiff)

sdiff -s file1 file2

回答by FindlinuxOne

You can also use: colordiff: Displays the output of diff with colors.

您还可以使用:colordiff:用颜色显示 diff 的输出。

About vimdiff: It allows you to compare files via SSH, for example :

关于vimdiff:它允许您通过 SSH 比较文件,例如:

vimdiff /var/log/secure scp://192.168.1.25/var/log/secure

Extracted from: http://www.sysadmit.com/2016/05/linux-diferencias-entre-dos-archivos.html

摘自:http: //www.sysadmit.com/2016/05/linux-diferencias-entre-dos-archivos.html

回答by joelostblom

If you prefer the diff output style from git diff, you can use it with the --no-indexflag to compare files not in a git repository:

如果您更喜欢 diff 输出样式git diff,您可以将它与--no-index标志一起使用来比较不在 git 存储库中的文件:

git diff --no-index a.txt b.txt

Using a couple of files with around 200k file name strings in each, I benchmarked (with the built-in timecommand) this approach vs some of the other answers here:

使用几个包含大约 20 万个文件名字符串的文件,我对time这种方法与这里的其他一些答案进行了基准测试(使用内置命令):

git diff --no-index a.txt b.txt
# ~1.2s

comm -23 <(sort a.txt) <(sort b.txt)
# ~0.2s

diff a.txt b.txt
# ~2.6s

sdiff a.txt b.txt
# ~2.7s

vimdiff a.txt b.txt
# ~3.2s

commseems to be the fastest by far, while git diff --no-indexappears to be the fastest approach for diff-style output.

comm到目前为止似乎是最快的,而git diff --no-index似乎是差异式输出的最快方法。



Update 2018-03-25You can actually omit the --no-indexflag unless you are inside a git repository and want to compare untracked files within that repository. From the man pages:

更新 2018-03-25--no-index除非您在 git 存储库中并想要比较该存储库中未跟踪的文件,否则您实际上可以省略该标志。从手册页

This form is to compare the given two paths on the filesystem. You can omit the --no-index option when running the command in a working tree controlled by Git and at least one of the paths points outside the working tree, or when running the command outside a working tree controlled by Git.

这种形式是比较文件系统上给定的两个路径。在 Git 控制的工作树中运行命令并且至少有一个路径指向工作树之外,或者在 Git 控制的工作树之外运行命令时,您可以省略 --no-index 选项。

回答by Iurii Golskyi

Also, do not forget about mcdiff- Internal diff viewer of GNU Midnight Commander.

另外,不要忘记mcdiff- GNU Midnight Commander 的内部差异查看器。

For example:

例如:

mcdiff file1 file2

Enjoy!

享受!

回答by James Brown

Using awk for it. Test files:

使用awk。测试文件:

$ cat a.txt
one
two
three
four
four
$ cat b.txt
three
two
one

The awk:

awk:

$ awk '
NR==FNR {                    # process b.txt  or the first file
    seen[
four
four
] # hash words to hash seen next # next word in b.txt } # process a.txt or all files after the first !(
$ awk '
NR==FNR {
    seen[
four
] next } !(
$ cat a.txt
four,four,three,three,two,one
five,six
$ cat b.txt
one,two,three
in seen) { # if word is not hashed to seen seen[
awk -F, '                    # comma-separated input
NR==FNR {
    for(i=1;i<=NF;i++)       # loop all comma-separated fields
        seen[$i]
    next
}
{
    for(i=1;i<=NF;i++)
        if(!($i in seen)) {
             seen[$i]        # this time we buffer output (below):
             buffer=buffer (buffer==""?"":",") $i
        }
    if(buffer!="") {         # output unempty buffers after each record in a.txt
        print buffer
        buffer=""
    }
}' b.txt a.txt
] # hash unseen a.txt words to seen to avoid duplicates print # and output it }' b.txt a.txt
in seen)' b.txt a.txt # if word is not hashed to seen, output it

Duplicates are outputed:

输出重复项:

four
five,six

To avoid duplicates, add each newly met word in a.txt to seenhash:

为避免重复,将 a.txt 中每个新遇到的单词添加到seen哈希中:

##代码##

Output:

输出:

##代码##

If the word lists are comma-separated, like:

如果单词列表以逗号分隔,例如:

##代码##

you have to do a couple of extra laps (forloops):

你必须多做几圈(for循环):

##代码##

Output this time:

这次输出:

##代码##