在 Linux 上删除 Windows 换行符（sed 与 awk）

Question

提问by kermatt

Have some delimited files with improperly placed newline characters in the middle of fields (not line ends), appearing as ^M in Vim. They originate from freebcp (on Centos 6) exports of a MSSQL database. Dumping the data in hex shows \r\n patterns:

有一些分隔文件，在字段中间（不是行尾）放置了不正确的换行符，在 Vim 中显示为 ^M。它们源自 MSSQL 数据库的 freebcp（在 Centos 6 上）导出。以十六进制转储数据显示 \r\n 模式：

$ xxd test.txt | grep 0d0a
0000190: 3932 3139 322d 3239 3836 0d0a 0d0a 7c43

I can remove them with awk, but am unable to do the same with sed.

我可以用 awk 删除它们，但不能用 sed 做同样的事情。

This works in awk, removing the line breaks completely:

这适用于 awk，完全删除换行符：

awk 'gsub(/\r/,""){printf sed -i 's/\r//g'
;next}{print}'

But this in sed does not, leaving line feeds in place:

但这在 sed 中没有，留下换行符：

sed -i 's/\r\n//g'

where this appears to have no effect:

这似乎没有效果：

echo $string | sed $'s/\r//'

Using ^M in the sed expression (ctrl+v, ctrl+m) also does not seem to work.

在 sed 表达式 (ctrl+v, ctrl+m) 中使用 ^M 似乎也不起作用。

For this sort of task, sed is easier to grok, but I am working on learning more about both. Am I using sed improperly, or is there a limitation?

对于这类任务，sed 更容易理解，但我正在努力学习更多关于两者的知识。sed 使用不当，还是有限制？

Answer 1

采纳答案by chepner

I believe some versions of sedwill not recognize \ras a character. However, you can use a bashfeature to work around that limitation:

我相信某些版本sed不会识别\r为字符。但是，您可以使用一项bash功能来解决该限制：

dos2unix input

Here, you let bashreplace '\r' with the actual carriage return character inside the $'...'construct before passing that to sedas its command. (Assuming you use bash; other shells should have a similar construct.)

在这里，您可以bash将 '\r' 替换为$'...'构造中的实际回车符，然后再将其sed作为命令传递给它。（假设您使用bash; 其他 shell 应该具有类似的构造。）

Answer 2

回答by kev

You can use the command line tool dos2unix

您可以使用命令行工具 dos2unix

tr -d '\r' <input >output

Or use the trcommand:

或者使用以下tr命令：

:e ++ff=dos
:w ++ff=unix
:e!

Actually, you can do the file-format switching in vim:

实际上，您可以在以下位置进行文件格式切换vim：

方法一：

:e ++ff=dos
:set ff=unix
:w

方法B：

:e ++ff=unix           " <-- make sure open with UNIX format
:%s/\r\n//g            " <-- remove all \r\n
:w                     " <-- save file

EDIT

编辑

If you want to delete the \r\nsequences in the file, try these commands in vim:

如果要删除\r\n文件中的序列，请尝试以下命令vim：

sed '1h;1!H;$!d;${g;s/\r\n//g}' input
sed ':A;/\r$/{N;bA};s/\r\n//g' input

Your awksolution works fine. Another two sedsolutions:

您的awk解决方案工作正常。另外两个sed解决方案：

awk 1 RS='\r\n' ORS=

Answer 3

回答by Steven Penny

Another method

另一种方法

##代码##

set Record Separator to \r\n
set Output Record Separator to empty string
1is always true, and in the absence of an action block {print}is used

将记录分隔符设置为 \r\n
将输出记录分隔符设置为空字符串
1始终为真，并且在没有动作块的{print}情况下使用

Answer 4

回答by Sergiy Dolnyy

sed -e 's/\r//g' input_file

This works for me. The difference of -einstead of -icommand.

这对我有用。-e与-i命令的区别。

Also I mentioned that see on different platforms behave differently. Mine is:sed --version This is not GNU sed version 4.0

我还提到在不同平台上看到的行为不同。我的是：sed --version This is not GNU sed version 4.0

在 Linux 上删除 Windows 换行符（sed 与 awk）

提问by kermatt

采纳答案by chepner

回答by kev

EDIT

编辑

回答by Steven Penny

回答by Sergiy Dolnyy

相关推荐

最近更新

标签

在 Linux 上删除 Windows 换行符（sed 与 awk）

提问by kermatt

采纳答案by chepner

回答by kev

EDIT

编辑

回答by Steven Penny

回答by Sergiy Dolnyy

相关推荐

Linux Shell - 将变量内容写入文件

c# 标识符预期？

Linux 通过串口发送文件

Linux 如何在shell脚本中扩展相对路径

相关推荐

最近更新

标签