在 Linux 上删除 Windows 换行符(sed 与 awk)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11680815/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 13:56:10  来源:igfitidea点击:

Removing Windows newlines on Linux (sed vs. awk)

linuxsedawk

提问by kermatt

Have some delimited files with improperly placed newline characters in the middle of fields (not line ends), appearing as ^M in Vim. They originate from freebcp (on Centos 6) exports of a MSSQL database. Dumping the data in hex shows \r\n patterns:

有一些分隔文件,在字段中间(不是行尾)放置了不正确的换行符,在 Vim 中显示为 ^M。它们源自 MSSQL 数据库的 freebcp(在 Centos 6 上)导出。以十六进制转储数据显示 \r\n 模式:

$ xxd test.txt | grep 0d0a
0000190: 3932 3139 322d 3239 3836 0d0a 0d0a 7c43

I can remove them with awk, but am unable to do the same with sed.

我可以用 awk 删除它们,但不能用 sed 做同样的事情。

This works in awk, removing the line breaks completely:

这适用于 awk,完全删除换行符:

awk 'gsub(/\r/,""){printf 
sed -i 's/\r//g'
;next}{print}'

But this in sed does not, leaving line feeds in place:

但这在 sed 中没有,留下换行符:

sed -i 's/\r\n//g'

where this appears to have no effect:

这似乎没有效果:

echo $string | sed $'s/\r//'

Using ^M in the sed expression (ctrl+v, ctrl+m) also does not seem to work.

在 sed 表达式 (ctrl+v, ctrl+m) 中使用 ^M 似乎也不起作用。

For this sort of task, sed is easier to grok, but I am working on learning more about both. Am I using sed improperly, or is there a limitation?

对于这类任务,sed 更容易理解,但我正在努力学习更多关于两者的知识。sed 使用不当,还是有限制?

采纳答案by chepner

I believe some versions of sedwill not recognize \ras a character. However, you can use a bashfeature to work around that limitation:

我相信某些版本sed不会识别\r为字符。但是,您可以使用一项bash功能来解决该限制:

dos2unix input

Here, you let bashreplace '\r' with the actual carriage return character inside the $'...'construct before passing that to sedas its command. (Assuming you use bash; other shells should have a similar construct.)

在这里,您可以bash将 '\r' 替换为$'...'构造中的实际回车符,然后再将其sed作为命令传递给它。(假设您使用bash; 其他 shell 应该具有类似的构造。)

回答by kev

You can use the command line tool dos2unix

您可以使用命令行工具 dos2unix

tr -d '\r' <input >output

Or use the trcommand:

或者使用以下tr命令:

:e ++ff=dos
:w ++ff=unix
:e!


Actually, you can do the file-format switching in vim:

实际上,您可以在以下位置进行文件格式切换vim

方法一:
:e ++ff=dos
:set ff=unix
:w
方法B:
:e ++ff=unix           " <-- make sure open with UNIX format
:%s/\r\n//g            " <-- remove all \r\n
:w                     " <-- save file


EDIT

编辑

If you want to delete the \r\nsequences in the file, try these commands in vim:

如果要删除\r\n文件中的序列,请尝试以下命令vim

sed '1h;1!H;$!d;${g;s/\r\n//g}' input
sed ':A;/\r$/{N;bA};s/\r\n//g' input

Your awksolution works fine. Another two sedsolutions:

您的awk解决方案工作正常。另外两个sed解决方案:

awk 1 RS='\r\n' ORS=

回答by Steven Penny

Another method

另一种方法

##代码##
  • set Record Separator to \r\n
  • set Output Record Separator to empty string
  • 1is always true, and in the absence of an action block {print}is used
  • 将记录分隔符设置为 \r\n
  • 将输出记录分隔符设置为空字符串
  • 1始终为真,并且在没有动作块的{print}情况下使用

回答by Sergiy Dolnyy

sed -e 's/\r//g' input_file

sed -e 's/\r//g' input_file

This works for me. The difference of -einstead of -icommand.

这对我有用。-e-i命令的区别。

Also I mentioned that see on different platforms behave differently. Mine is:sed --version This is not GNU sed version 4.0

我还提到在不同平台上看到的行为不同。我的是:sed --version This is not GNU sed version 4.0