Linux 如何使用sed提取子串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16675179/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 22:57:53  来源:igfitidea点击:

How to use sed to extract substring

linuxshellubuntuxml-parsingsed

提问by MOHAMED

I have a file containing the following lines:

我有一个包含以下几行的文件:

  <parameter name="PortMappingEnabled" access="readWrite" type="xsd:boolean"></parameter>
  <parameter name="PortMappingLeaseDuration" access="readWrite" activeNotify="canDeny" type="xsd:unsignedInt"></parameter>
  <parameter name="RemoteHost" access="readWrite"></parameter>
  <parameter name="ExternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="ExternalPortEndRange" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="InternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="PortMappingProtocol" access="readWrite"></parameter>
  <parameter name="InternalClient" access="readWrite"></parameter>
  <parameter name="PortMappingDescription" access="readWrite"></parameter>

I want to execute command on this file to extract only the parameter names as displayed in the following output:

我想在此文件上执行命令以仅提取以下输出中显示的参数名称:

$sedcommand file.txt
PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription

What could be this command?

这个命令可能是什么?

采纳答案by Chris

You want awk.

你要awk

This would be a quick and dirty hack:

这将是一个快速而肮脏的黑客:

awk -F "\"" '{print $2}' /tmp/file.txt

awk -F "\"" '{print $2}' /tmp/file.txt

PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription

回答by unxnut

sed 's/[^"]*"\([^"]*\).*/\1/'

sed 's/[^"]*"\([^"]*\).*/\1/'

does the job.

做这份工作。

explanation of the part inside ' '

' '内部分的解释

  • s - tells sed to substitute
  • / - start of regex string to search for
  • [^"]* - any character that is not ", any number of times. (matching parameter name=)
  • " - just a ".
  • ([^"]*) -anything inside () will be saved for reference to use later. The \ are there so the brackets are not considered as characters to search for. [^"]* means the same as above. (matching RemoteHostfor example)
  • .* - any character, any number of times. (matching " access="readWrite"> /parameter)
  • / - end of the search regex, and start of the substitute string.
  • \1 - reference to that string we found in the brackets above.
  • / end of the substitute string.
  • s - 告诉 sed 替换
  • / - 要搜索的正则表达式字符串的开始
  • [^"]* - 任何不是 " 的字符,任意次数。(匹配参数名称=
  • “-只是一个
  • ([^"]*) -() 中的任何内容都将被保存以供以后参考使用。\ 在那里,因此括号不被视为要搜索的字符。[^"]* 含义与上述相同。(例如匹配RemoteHost
  • .* - 任意字符,任意次数。(匹配“ access="readWrite"> /parameter
  • / - 搜索正则表达式的结束,以及替代字符串的开始。
  • \1 - 引用我们在上面括号中找到的那个字符串。
  • / 替换字符串的结尾。

basically s/search for this/replace with this/ but we're telling him to replace the whole line with just a piece of it we found earlier.

基本上 s/search this/replace with this/ 但我们告诉他用我们之前找到的一部分替换整行。

回答by Kent

grepwas born to extract things:

grep是为了提取东西而生的:

grep -Po 'name="\K[^"]*'

test with your data:

用你的数据测试:

kent$  echo '<parameter name="PortMappingEnabled" access="readWrite" type="xsd:boolean"></parameter>
  <parameter name="PortMappingLeaseDuration" access="readWrite" activeNotify="canDeny" type="xsd:unsignedInt"></parameter>
  <parameter name="RemoteHost" access="readWrite"></parameter>
  <parameter name="ExternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="ExternalPortEndRange" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="InternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="PortMappingProtocol" access="readWrite"></parameter>
  <parameter name="InternalClient" access="readWrite"></parameter>
  <parameter name="PortMappingDescription" access="readWrite"></parameter>
'|grep -Po 'name="\K[^"]*'
PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription

回答by Micha? ?rajer

You should not parse XML using tools like sed, or awk. It's error-prone.

不应使用 sed 或 awk 等工具解析 XML。它很容易出错。

If input changes, and before name parameter you will get new-line character instead of space it will fail some day producing unexpected results.

如果输入更改,并且在 name 参数之前,您将获得换行符而不是空格,它会在某一天失败并产生意想不到的结果。

If you are really sure, that your input will be always formated this way, you can use cut. It's faster than sedand awk:

如果您真的确定您的输入将始终以这种方式格式化,您可以使用cut. 它比sedand更快awk

cut -d'"' -f2 < input.txt

It will be better to first parse it, and extract only parameter name attribute:

最好先解析它,然后只提取参数名称属性:

xpath -q -e //@name input.txt | cut -d'"' -f2

To learn more about xpath, see this tutorial: http://www.w3schools.com/xpath/

要了解有关 xpath 的更多信息,请参阅本教程:http: //www.w3schools.com/xpath/

回答by Rushi Agrawal

Explaining how you can use cut:

解释如何使用cut

cat yourxmlfile | cut -d'"' -f2

cat yourxmlfile | cut -d'"' -f2

It will 'cut' all the lines in the file based on "delimiter, and will take the 2nd field , which is what you wanted.

它将“剪切”根据文件中的所有行delimiter,并且将采取2˚Field,这是你想要的。