Linux sed中“保持空间”和“模式空间”的概念

Question

提问by ChenQi

I'm confused by the two concepts in sed: hold space and pattern space. Can someone help explain them?

我对 sed 中的两个概念感到困惑：保持空间和模式空间。有人可以帮忙解释一下吗？

Here's a snippet of the manual:

这是手册的一个片段：

h H    Copy/append pattern space to hold space.
g G    Copy/append hold space to pattern space.

n N    Read/append the next line of input into the pattern space.

h H    Copy/append pattern space to hold space.
g G    Copy/append hold space to pattern space.

n N    Read/append the next line of input into the pattern space.

These six commands really confuse me.

这六个命令真的让我很困惑。

Answer 1

采纳答案by January

When sed reads a file line by line, the line that has been currently read is inserted into the patternbuffer (pattern space). Pattern buffer is like the temporary buffer, the scratchpad where the current information is stored. When you tell sed to print, it prints the pattern buffer.

当 sed 逐行读取文件时，将当前已读取的行插入到模式缓冲区（模式空间）中。模式缓冲区就像临时缓冲区，存储当前信息的暂存器。当您告诉 sed 打印时，它会打印模式缓冲区。

Hold buffer / hold space is like a long-term storage, such that you can catch something, store it and reuse it later when sed is processing another line. You do not directly process the hold space, instead, you need to copy it or append to the pattern space if you want to do something with it. For example, the print command pprints the pattern space only. Likewise, soperates on the pattern space.

保持缓冲区/保持空间就像一个长期存储，这样你就可以捕获一些东西，存储它，并在 sed 处理另一行时重用它。您不直接处理保留空间，相反，如果您想对它做一些事情，则需要将其复制或附加到模式空间。例如，print 命令p仅打印模式空间。同样，s对模式空间进行操作。

Here is an example:

下面是一个例子：

sed -n '1!G;h;$p'

(the -n option suppresses automatic printing of lines)

（-n 选项禁止自动打印行）

There are three commands here: 1!G, hand $p. 1!Ghas an address, 1(first line), but the !means that the command will be executed everywhere buton the first line. $pon the other hand will only be executed on the last line. So what happens is this:

这里有三个命令：1!G,h和$p。1!G有一个地址，1（第一行），但!该命令将被执行到处手段，但在第一行上。$p另一方面只会在最后一行执行。所以发生的事情是这样的：

first line is read and inserted automatically into the pattern space
on the first line, first command is not executed; hcopies the first line into the holdspace.
now the second line replaces whatever was in the pattern space
on the second line, first we execute G, appending the contents of the hold buffer to the pattern buffer, separating it by a newline. The pattern space now contains the second line, a newline, and the first line.
Then, hcommand inserts the concatenated contents of the pattern buffer into the hold space, which now holds the reversed lines two and one.
We proceed to line number three -- go to the point (3) above.

第一行被读取并自动插入到模式空间中
第一行，不执行第一个命令；h将第一行复制到保留空间中。
现在第二行替换了模式空间中的任何内容
在第二行，首先我们执行G，将保持缓冲区的内容附加到模式缓冲区，用换行符分隔。模式空间现在包含第二行、换行符和第一行。
然后，hcommand 将模式缓冲区的连接内容插入到保持空间中，该空间现在保存反转的第二行和第一行。
我们继续第三行——转到上面的点 (3)。

Finally, after the last line has been read and the hold space (containing all the previous lines in a reverse order) have been appended to the pattern space, pattern space is printed with p. As you have guessed, the above does exactly what the taccommand does -- prints the file in reverse.

最后，在读取了最后一行并将保持空间（以相反顺序包含所有前面的行）附加到模式空间后，模式空间将打印p. 正如您已经猜到的，上面的内容与tac命令的作用完全相同——反向打印文件。

Answer 2

回答by Jens Jensen

@Ed Morton: I disagree with you here. I found sedvery useful and simple (once you grok the concept of the pattern and hold buffers) to come up with an elegant way to do multiline grepping.

@Ed Morton：我不同意你的看法。我发现sed非常有用和简单（一旦你理解了模式的概念并保持缓冲区）提出一种优雅的方式来进行多行 grepping。

For example, let's take a text file that has hostnames and some information about each host, with lots of junk in between that I dont care about.

例如，让我们采用一个文本文件，其中包含主机名和有关每个主机的一些信息，中间有很多我不关心的垃圾。

Host: foo1
some junk, doesnt matter
some junk, doesnt matter
Info: about foo1 that I really care about!!
some junk, doesnt matter
some junk, doesnt matter
Info: a second line about foo1 that I really care about!!
some junk, doesnt matter
some junk, doesnt matter
Host: foo2
some junk, doesnt matter
Info: about foo2 that I really care about!!
some junk, doesnt matter
some junk, doesnt matter

To me, an awk script to just get the lines with the hostname and the corresponding infoline would take a bit more than what I'm able to do with sed:

对我来说，一个 awk 脚本只获取带有主机名和相应info行的行将比我用 sed 所能做的要多一些：

sed -n '/Host:/{h}; /Info/{x;p;x;p;}' myfile.txt

output looks like:

输出看起来像：

Host: foo1
Info: about foo1 that I really care about!!
Host: foo1
Info: a second line about foo1 that I really care about!!
Host: foo2
Info: about foo2 that I really care about!!

(Note that Host: foo1appears twice in the output.)

（请注意，它Host: foo1在输出中出现了两次。）

Explanation:

解释：

-ndisables output unless explicitly printed
first match, finds and puts the Host:line into hold buffer (h)
second match, finds the next Info: line, but first exchanges (x) current line in pattern buffer with hold buffer, and prints (p) the Host:line, then re-exchanges (x) and prints (p) the Info: line.

-n除非明确打印，否则禁用输出
第一次匹配，找到该Host:行并将其放入保持缓冲区 (h)
第二个匹配，找到下一个 Info: 行，但首先将模式缓冲区中的当前行与保持缓冲区交换 (x)，并打印 (p) 该Host:行，然后重新交换 (x) 并打印 (p) Info: 行。

Yes, this is a simplistic example, but I suspect this is a common issue that was quickly dealt with by a simple sed one-liner. For much more complex tasks, such as ones in which you cannot rely on a given, predictable sequence, awk may be better suited.

是的，这是一个简单的例子，但我怀疑这是一个常见的问题，一个简单的 sed one-liner 很快就解决了这个问题。对于更复杂的任务，例如不能依赖给定的、可预测的序列的任务，awk 可能更适合。

Answer 3

回答by Sanghyun Lee

Although @January's answer and the example are nice, the explanation was not enough for me. I had to search and learn a lot until I managed to understand how exactly sed -n '1!G;h;$p'works. So I'd like to elaborate on the command for someone like me.

尽管@January 的回答和示例很好，但解释对我来说还不够。我必须搜索和学习很多东西，直到我设法了解它到底sed -n '1!G;h;$p'是如何工作的。所以我想为像我这样的人详细说明命令。

First of all, let's see what the command does.

首先，让我们看看该命令的作用。

$ echo {a..d} | tr ' ' '\n' # Prints from 'a' to 'd' in each line
a
b
c
d
$ echo {a..d} | tr ' ' '\n' | sed -n '1!G;h;$p'
d
c
b
a

It reverses the input like taccommand does.

它像tac命令一样反转输入。

sedreads line-by-line, so let's see what happens on the patten spaceand the hold spaceat each line. As hcommand copies the contents of the pattern space to the hold space, both spaces have the same text.

sed逐行读取，所以让我们看看每行的模式空间和保持空间发生了什么。当h命令将模式空间的内容复制到保留空间时，两个空间具有相同的文本。

Read line    Pattern Space / Hold Space    Command executed
-----------------------------------------------------------
a            a$                            h
b            b\na$                         1!G;h
c            c\nb\na$                      1!G;h
d            d\nc\nb\na$                   1!G;h;$p

At the last line, $pprints d\nc\nb\na$which is formatted to

在最后一行，$p打印d\nc\nb\na$格式为

d
c
b
a

If you want to see the pattern space for each line, you can add an lcommand.

如果要查看每行的模式空间，可以添加l命令。

$ echo {a..d} | tr ' ' '\n' | sed -n '1!G;h;l;$p'
a$
b\na$
c\nb\na$
d\nc\nb\na$
d
c
b
a

I found it very helpful to watch this video tutorial Understanding how sed works, as the guy shows how each space will be used step by step. The hold spaced is referred in the 4th tutorial, but I recommend watching all the videos if you are not familiar with sed.

我发现观看此视频教程了解 sed 的工作原理非常有帮助，因为该人展示了如何逐步使用每个空间。保持间隔在第 4 个教程中提到，但如果您不熟悉sed.

Also GNU sed documentand Bruce Barnett's Sed tutorialare very good references.

此外GNU sed的文档和布鲁斯·巴内特的桑达教程是非常好的参考。

Linux sed中“保持空间”和“模式空间”的概念

提问by ChenQi

采纳答案by January

回答by Jens Jensen

回答by Sanghyun Lee

相关推荐

最近更新

标签

Linux sed中“保持空间”和“模式空间”的概念

提问by ChenQi

采纳答案by January

回答by Jens Jensen

回答by Sanghyun Lee

相关推荐

C# Java HashMap 等价物

Linux sh: 0: getcwd() 失败：引用的驱动器上没有这样的文件或目录

如何在 case 语句中使用 C# 枚举值的字符串值？

c语言中的linux定时器

相关推荐

最近更新

标签