Linux 取文本文件中的第 n 列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17137269/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 23:14:09  来源:igfitidea点击:

Take nth column in a text file

linuxbash

提问by mnrl

I have a text file:

我有一个文本文件:

1 Q0 1657 1 19.6117 Exp
1 Q0 1410 2 18.8302 Exp
2 Q0 3078 1 18.6695 Exp
2 Q0 2434 2 14.0508 Exp
2 Q0 3129 3 13.5495 Exp

I want to take the 2nd and 4th word of every line like this:

我想像这样取每行的第二个和第四个字:

1657 19.6117
1410 18.8302
3078 18.6695
2434 14.0508
3129 13.5495

I'm using this code:

我正在使用此代码:

 nol=$(cat "/path/of/my/text" | wc -l)
 x=1
 while  [ $x -le "$nol" ]
 do
     line=($(sed -n "$x"p /path/of/my/text)
     echo ""${line[1]}" "${line[3]}""  >> out.txt
     x=$(( $x + 1 ))
 done

It works, but it is very complicated and takes a long time to process long text files.

它有效,但它非常复杂,处理长文本文件需要很长时间。

Is there a simpler way to do this?

有没有更简单的方法来做到这一点?

采纳答案by Tom van der Woerdt

iirc :

国际研究中心:

cat filename.txt | awk '{ print   }'

or, as mentioned in the comments :

或者,如评论中所述:

awk '{ print   }' filename.txt

回答by jm666

You can use the cutcommand:

您可以使用以下cut命令:

cut -d' ' -f3,5 < datafile.txt

prints

印刷

1657 19.6117
1410 18.8302
3078 18.6695
2434 14.0508
3129 13.5495

the

  • -d' '- mean, use spaceas a delimiter
  • -f3,5- take and print 3rd and 5th column
  • -d' '- 意思是,space用作分隔符
  • -f3,5- 获取并打印第 3 和第 5 列

The cutis much fasterfor large files as a pure shell solution. If your file is delimited with multiple whitespaces, you can remove them first, like:

cut要快得多对于大文件作为一个纯粹的外壳解决方案。如果您的文件由多个空格分隔,您可以先删除它们,例如:

sed 's/[\t ][\t ]*/ /g' < datafile.txt | cut -d' ' -f3,5

where the (gnu) sed will replace any tabor spacecharacters with a single space.

其中(GNU)的sed将取代任何tabspace字符与单space

For a variant - here is a perl solution too:

对于变体 - 这里也是一个 perl 解决方案:

perl -lanE 'say "$F[2] $F[4]"' < datafile.txt

回答by ruakh

If your file contains nlines, then your script has to read the file ntimes; so if you double the length of the file, you quadruple the amount of work your script does — and almost all of that work is simply thrown away, since all you want to do is loop over the lines in order.

如果您的文件包含n行,那么您的脚本必须读取该文件n次;因此,如果您将文件的长度加倍,您的脚本所做的工作量就会增加四倍——而且几乎所有这些工作都被扔掉了,因为您要做的只是按顺序循环遍历各行。

Instead, the best way to loop over the lines of a file is to use a whileloop, with the condition-command being the readbuiltin:

相反,循环文件行的最佳方法是使用while循环,条件命令是read内置的:

while IFS= read -r line ; do
    # $line is a single line of the file, as a single string
    : ... commands that use $line ...
done < input_file.txt

In your case, since you want to split the line into an array, and the readbuiltin actually has special support for populating an array variable, which is what you want, you can write:

在您的情况下,由于您想将该行拆分为一个数组,并且read内置函数实际上特别支持填充数组变量,这正是您想要的,您可以编写:

while read -r -a line ; do
    echo ""${line[1]}" "${line[3]}"" >> out.txt
done < /path/of/my/text

or better yet:

或者更好:

while read -r -a line ; do
    echo "${line[1]} ${line[3]}"
done < /path/of/my/text > out.txt

However, for what you're doing you can just use the cututility:

但是,对于您正在做的事情,您可以使用该cut实用程序:

cut -d' ' -f2,4 < /path/of/my/text > out.txt

(or awk, as Tom van der Woerdt suggests, or perl, or even sed).

(或者awk,正如 Tom van der Woerdt 所建议的那样,或者perl,甚至sed)。

回答by Johannes Weiss

For the sake of completeness:

为了完整起见:

while read _ _ one _ two _; do
    echo "$one $two"
done < file.txt

Instead of _an arbitrary variable (such as junk) can be used as well. The point is just to extract the columns.

也可以使用_任意变量(例如junk)来代替。重点只是提取列。

Demo:

演示:

$ while read _ _ one _ two _; do echo "$one $two"; done < /tmp/file.txt
1657 19.6117
1410 18.8302
3078 18.6695
2434 14.0508
3129 13.5495

回答by AKA11

One more simple variant -

一种更简单的变体——

$ while read line
  do
      set $line          # assigns words in line to positional parameters
      echo " "
  done < file

回答by ingyhere

If you are using structured data, this has the added benefit of not invoking an extra shell process to run trand/or cutor something. ...

如果您使用的是结构化数据,这还有一个额外的好处,即不调用额外的 shell 进程来运行tr和/cut或其他东西。...

(Of course, you will want to guard against bad inputs with conditionals and sane alternatives.)

(当然,您将需要使用条件和合理的替代方案来防止错误的输入。)

...
while read line ; 
do 
    lineCols=( $line ) ;
    echo "${lineCols[0]}"
    echo "${lineCols[1]}"
done < $myFQFileToRead ; 
...