Linux 取文本文件中的第 n 列

Question

提问by mnrl

I have a text file:

我有一个文本文件：

1 Q0 1657 1 19.6117 Exp
1 Q0 1410 2 18.8302 Exp
2 Q0 3078 1 18.6695 Exp
2 Q0 2434 2 14.0508 Exp
2 Q0 3129 3 13.5495 Exp

I want to take the 2nd and 4th word of every line like this:

我想像这样取每行的第二个和第四个字：

1657 19.6117
1410 18.8302
3078 18.6695
2434 14.0508
3129 13.5495

I'm using this code:

我正在使用此代码：

 nol=$(cat "/path/of/my/text" | wc -l)
 x=1
 while  [ $x -le "$nol" ]
 do
     line=($(sed -n "$x"p /path/of/my/text)
     echo ""${line[1]}" "${line[3]}""  >> out.txt
     x=$(( $x + 1 ))
 done

It works, but it is very complicated and takes a long time to process long text files.

它有效，但它非常复杂，处理长文本文件需要很长时间。

Is there a simpler way to do this?

有没有更简单的方法来做到这一点？

Answer 1

采纳答案by Tom van der Woerdt

iirc :

国际研究中心：

cat filename.txt | awk '{ print   }'

or, as mentioned in the comments :

或者，如评论中所述：

awk '{ print   }' filename.txt

Answer 2

回答by jm666

You can use the cutcommand:

您可以使用以下cut命令：

cut -d' ' -f3,5 < datafile.txt

prints

印刷

1657 19.6117
1410 18.8302
3078 18.6695
2434 14.0508
3129 13.5495

the

这

-d' '- mean, use spaceas a delimiter
-f3,5- take and print 3rd and 5th column

-d' '- 意思是，space用作分隔符
-f3,5- 获取并打印第 3 和第 5 列

The cutis much fasterfor large files as a pure shell solution. If your file is delimited with multiple whitespaces, you can remove them first, like:

该cut是要快得多对于大文件作为一个纯粹的外壳解决方案。如果您的文件由多个空格分隔，您可以先删除它们，例如：

sed 's/[\t ][\t ]*/ /g' < datafile.txt | cut -d' ' -f3,5

where the (gnu) sed will replace any tabor spacecharacters with a single space.

其中（GNU）的sed将取代任何tab或space字符与单space。

For a variant - here is a perl solution too:

对于变体 - 这里也是一个 perl 解决方案：

perl -lanE 'say "$F[2] $F[4]"' < datafile.txt

Answer 3

回答by ruakh

If your file contains nlines, then your script has to read the file ntimes; so if you double the length of the file, you quadruple the amount of work your script does — and almost all of that work is simply thrown away, since all you want to do is loop over the lines in order.

如果您的文件包含n行，那么您的脚本必须读取该文件n次；因此，如果您将文件的长度加倍，您的脚本所做的工作量就会增加四倍——而且几乎所有这些工作都被扔掉了，因为您要做的只是按顺序循环遍历各行。

Instead, the best way to loop over the lines of a file is to use a whileloop, with the condition-command being the readbuiltin:

相反，循环文件行的最佳方法是使用while循环，条件命令是read内置的：

while IFS= read -r line ; do
    # $line is a single line of the file, as a single string
    : ... commands that use $line ...
done < input_file.txt

In your case, since you want to split the line into an array, and the readbuiltin actually has special support for populating an array variable, which is what you want, you can write:

在您的情况下，由于您想将该行拆分为一个数组，并且read内置函数实际上特别支持填充数组变量，这正是您想要的，您可以编写：

while read -r -a line ; do
    echo ""${line[1]}" "${line[3]}"" >> out.txt
done < /path/of/my/text

or better yet:

或者更好：

while read -r -a line ; do
    echo "${line[1]} ${line[3]}"
done < /path/of/my/text > out.txt

However, for what you're doing you can just use the cututility:

但是，对于您正在做的事情，您可以使用该cut实用程序：

cut -d' ' -f2,4 < /path/of/my/text > out.txt

(or awk, as Tom van der Woerdt suggests, or perl, or even sed).

（或者awk，正如 Tom van der Woerdt 所建议的那样，或者perl，甚至sed）。

Answer 4

回答by Johannes Weiss

For the sake of completeness:

为了完整起见：

while read _ _ one _ two _; do
    echo "$one $two"
done < file.txt

Instead of _an arbitrary variable (such as junk) can be used as well. The point is just to extract the columns.

也可以使用_任意变量（例如junk）来代替。重点只是提取列。

Demo:

演示：

$ while read _ _ one _ two _; do echo "$one $two"; done < /tmp/file.txt
1657 19.6117
1410 18.8302
3078 18.6695
2434 14.0508
3129 13.5495

Answer 5

回答by AKA11

One more simple variant -

一种更简单的变体——

$ while read line
  do
      set $line          # assigns words in line to positional parameters
      echo " "
  done < file

Answer 6

回答by ingyhere

If you are using structured data, this has the added benefit of not invoking an extra shell process to run trand/or cutor something. ...

如果您使用的是结构化数据，这还有一个额外的好处，即不调用额外的 shell 进程来运行tr和/cut或其他东西。...

(Of course, you will want to guard against bad inputs with conditionals and sane alternatives.)

（当然，您将需要使用条件和合理的替代方案来防止错误的输入。）

...
while read line ; 
do 
    lineCols=( $line ) ;
    echo "${lineCols[0]}"
    echo "${lineCols[1]}"
done < $myFQFileToRead ; 
...

Linux 取文本文件中的第 n 列

提问by mnrl

采纳答案by Tom van der Woerdt

回答by jm666

回答by ruakh

回答by Johannes Weiss

回答by AKA11

回答by ingyhere

相关推荐

最近更新

标签

Linux 取文本文件中的第 n 列

提问by mnrl

采纳答案by Tom van der Woerdt

回答by jm666

回答by ruakh

回答by Johannes Weiss

回答by AKA11

回答by ingyhere

相关推荐

Linux Bash 脚本中的正则表达式

Linux 通过PHP在远程机器上执行命令

C# 如何捕获鼠标移动事件

Linux ./studio.sh 之后的 Android Studio 错误

相关推荐

最近更新

标签