Linux 在中断的传输上恢复 rsync 部分 (-P/--partial)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16572066/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Resuming rsync partial (-P/--partial) on a interrupted transfer
提问by Glitches
I am trying to backup my file server to a remove file server using rsync. Rsync is not successfully resuming when a transfer is interrupted. I used the partial option but rsync doesn't find the file it already started because it renames it to a temporary file and when resumed it creates a new file and starts from beginning.
我正在尝试使用 rsync 将我的文件服务器备份到删除文件服务器。传输中断时,Rsync 未成功恢复。我使用了部分选项,但 rsync 没有找到它已经启动的文件,因为它将它重命名为一个临时文件,并在恢复时创建一个新文件并从头开始。
Here is my command:
这是我的命令:
rsync -avztP -e "ssh -p 2222" /volume1/ myaccont@backup-server-1:/home/myaccount/backup/ --exclude "@spool" --exclude "@tmp"
rsync -avztP -e "ssh -p 2222" /volume1/ myaccont@backup-server-1:/home/myaccount/backup/ --exclude "@spool" --exclude "@tmp"
When this command is ran, a backup file named OldDisk.dmgfrom my local machine get created on the remote machine as something like .OldDisk.dmg.SjDndj23.
运行此命令时,会在远程计算机上创建来自我本地计算机的名为OldDisk.dmg的备份文件,类似于.OldDisk.dmg.SjDndj23。
Now when the internet connection gets interrupted and I have to resume the transfer, I have to find where rsync left off by finding the temp file like .OldDisk.dmg.SjDndj23and rename it to OldDisk.dmgso that it sees there already exists a file that it can resume.
现在,当 Internet 连接中断并且我必须恢复传输时,我必须通过查找.OldDisk.dmg.SjDndj23之类的临时文件并将其重命名为OldDisk.dmg以便它看到已经存在一个它可以恢复的文件。
How do I fix this so I don't have to manually intervene each time?
我该如何解决这个问题,这样我就不必每次都手动干预了?
采纳答案by Richard Michael
TL;DR: Use --timeout=X
(X in seconds) to change the default rsync server timeout, not --inplace
.
TL;DR:使用--timeout=X
(X in seconds) 来更改默认的 rsync 服务器超时,而不是--inplace
.
The issue is the rsync server processes (of which there are two, see rsync --server ...
in ps
output on the receiver) continue running, to wait for the rsync client to send data.
问题是rsync的服务器进程(其中有两个,看rsync --server ...
在ps
接收器上的输出)继续运行,等待客户端的rsync发送数据。
If the rsync server processes do not receive data for a sufficient time, they will indeed timeout, self-terminate and cleanup by moving the temporary file to it's "proper" name (e.g., no temporary suffix). You'll then be able to resume.
如果 rsync 服务器进程在足够长的时间内没有接收到数据,它们确实会超时、自行终止并通过将临时文件移动到它的“正确”名称(例如,没有临时后缀)来进行清理。然后你就可以继续了。
If you don't want to wait for the long default timeout to cause the rsync server to self-terminate, then when your internet connection returns, log into the server and clean up the rsync server processes manually. However, you must politely terminatersync -- otherwise, it will not move the partial file into place; but rather, delete it (and thus there is no file to resume). To politely ask rsync to terminate, do not SIGKILL
(e.g., -9
), but SIGTERM
(e.g., pkill -TERM -x rsync
- only an example, you should take care to match only the rsync processes concerned with your client).
如果您不想等待很长的默认超时导致 rsync 服务器自行终止,那么当您的 Internet 连接恢复时,请登录服务器并手动清理 rsync 服务器进程。但是,您必须礼貌地终止rsync —— 否则,它不会将部分文件移动到位;而是删除它(因此没有要恢复的文件)。礼貌地要求 rsync 终止,不要SIGKILL
(例如,-9
),但是SIGTERM
(例如,pkill -TERM -x rsync
- 只是一个例子,您应该注意只匹配与您的客户端相关的 rsync 进程)。
Fortunately there is an easier way: use the --timeout=X
(X in seconds) option; it is passed to the rsync server processes as well.
幸运的是,有一个更简单的方法:使用--timeout=X
(X in seconds) 选项;它也被传递给 rsync 服务器进程。
For example, if you specify rsync ... --timeout=15 ...
, both the client and server rsync processes will cleanly exit if they do not send/receive data in 15 seconds. On the server, this means moving the temporary file into position, ready for resuming.
例如,如果您指定 ,如果rsync ... --timeout=15 ...
客户端和服务器 rsync 进程在 15 秒内没有发送/接收数据,它们都将干净地退出。在服务器上,这意味着将临时文件移动到位,准备恢复。
I'm not sure of the default timeout value of the various rsync processes will try to send/receive data before they die (it might vary with operating system). In my testing, the server rsync processes remain running longer than the local client. On a "dead" network connection, the client terminates with a broken pipe (e.g., no network socket) after about 30 seconds; you could experiment or review the source code. Meaning, you could try to "ride out" the bad internet connection for 15-20 seconds.
我不确定各种 rsync 进程的默认超时值会在它们死之前尝试发送/接收数据(它可能因操作系统而异)。在我的测试中,服务器 rsync 进程的运行时间比本地客户端长。在“死”的网络连接上,客户端在大约 30 秒后以断开的管道(例如,没有网络套接字)终止;您可以试验或查看源代码。意思是,您可以尝试“摆脱”不良的互联网连接 15-20 秒。
If you do not clean up the server rsync processes (or wait for them to die), but instead immediately launch another rsync client process, two additional server processes will launch (for the other end of your new client process). Specifically, the new rsync client will notre-use/reconnect to the existing rsync server processes. Thus, you'll have two temporary files (and four rsync server processes) -- though, only the newer, second temporary file has new data being written (received from your new rsync client process).
如果您不清理服务器 rsync 进程(或等待它们死亡),而是立即启动另一个 rsync 客户端进程,则会启动另外两个服务器进程(用于新客户端进程的另一端)。具体来说,新的 rsync 客户端不会重用/重新连接到现有的 rsync 服务器进程。因此,您将有两个临时文件(和四个 rsync 服务器进程)——不过,只有较新的第二个临时文件才会写入新数据(从新的 rsync 客户端进程接收)。
Interestingly, if you then clean up all rsync server processes (for example, stop your client which will stop the new rsync servers, then SIGTERM
the older rsync servers, it appears to merge (assemble) all the partial files into the new proper named file. So, imagine a long running partial copy which dies (and you think you've "lost" all the copied data), and a short running re-launched rsync (oops!).. you can stop the second client, SIGTERM
the first servers, it will merge the data, and you can resume.
有趣的是,如果您随后清理所有 rsync 服务器进程(例如,停止将停止新 rsync 服务器的客户端,然后SIGTERM
是旧的 rsync 服务器,它似乎将所有部分文件合并(组装)到新的正确命名的文件中。因此,想象一个长时间运行的部分副本死亡(并且您认为您已经“丢失”了所有复制的数据),以及一个短时间运行的重新启动的 rsync(哎呀!)..您可以停止第二个客户端,SIGTERM
第一个服务器,它将合并数据,您可以继续。
Finally, a few short remarks:
最后,简短的说几句:
- Don't use
--inplace
to workaround this. You will undoubtedly have other problems as a result,man rsync
for the details. - It's trivial, but
-t
in your rsync options is redundant, it is implied by-a
. - An already compressed disk image sent over rsync withoutcompression might result in shorter transfer time (by avoiding double compression). However, I'm unsure of the compression techniques in both cases. I'd test it.
- As far as I understand
--checksum
/-c
, it won't help you in this case. It affects how rsync decides if it shouldtransfer a file. Though, after a first rsync completes, you could run a secondrsync with-c
to insist on checksums, to prevent the strange case that file size and modtime are the same on both sides, but bad data was written.
- 不要
--inplace
用来解决这个问题。毫无疑问,您会因此而遇到其他问题,man rsync
详情请参阅。 - 这很简单,但是
-t
在您的 rsync 选项中是多余的,它由-a
. - 通过 rsync 发送未经压缩的已压缩磁盘映像可能会导致更短的传输时间(通过避免双重压缩)。但是,我不确定这两种情况下的压缩技术。我会测试它。
- 据我了解
--checksum
/-c
,在这种情况下它不会帮助你。它会影响 rsync 决定是否应该传输文件的方式。但是,在第一个 rsync 完成后,您可以运行第二个rsync-c
以坚持校验和,以防止出现两侧文件大小和 modtime 相同但写入错误数据的奇怪情况。
回答by Glitches
I found that adding --inplace fixes it. Not sure how --partial is supposed to work without it but it resumed my transfers. My files are still pretty big though and I'm wondering if I will end up with corrupt files if a transfer starts and hours later another transfer starts but sees an incomplete file and doesn't know its currently being uploaded which then starts adding bytes to it. Anyone know? Maybe some bash scripting to log the current process id and not start another transfer?
我发现添加 --inplace 可以修复它。不知道 --partial 在没有它的情况下应该如何工作,但它恢复了我的传输。我的文件仍然很大,我想知道如果传输开始并且几个小时后另一个传输开始但看到一个不完整的文件并且不知道它当前正在上传然后开始向其中添加字节,我是否会以损坏的文件结束它。有人知道吗?也许一些 bash 脚本来记录当前进程 ID 而不是开始另一个传输?
回答by mogul
if you are afraid of corrupt files after a resume, you could add --checksum
to force it to do checksumming on the whole file every time. Indeed it will cost you some disk-IO and CPU cycles, but only a slight network overhead.
如果您害怕恢复后的文件损坏,您可以添加--checksum
强制它每次对整个文件进行校验和。实际上,它会花费您一些磁盘 IO 和 CPU 周期,但只会产生轻微的网络开销。
回答by gaoithe
Sorry but the other answers here are too complicated :-7. A simpler answer working for me: (using rsync over -e ssh)
抱歉,这里的其他答案太复杂了:-7。对我有用的更简单的答案:(使用 rsync over -e ssh)
# optionally move rsync temp file, then resume using rsync
dst$ mv .<filename>.6FuChr <filename>
src$ rsync -avhzP --bwlimit=1000 -e ssh <fromfiles> <user@somewhere>:<destdir>/
Works also when resuming from an scp which was interrupted.
从被中断的 scp 恢复时也有效。
Rsync creates a temporary file ... The temporary file grows quickly to size of partially transferred file. Transfer resumes.
Rsync 创建一个临时文件...临时文件会快速增长到部分传输文件的大小。转会恢复。
Scp writes to the actual end destination file . If transfer is interrupted this is a truncated file.
Scp 写入实际的最终目标文件。如果传输中断,这是一个被截断的文件。
Explaination of args:
args的解释:
-avhz .. h=humanoid, v=verbose, a=archive, z=compression .. archive instructs it to maintain time_t values so even if clocks are out rsync knows the true date of each file
-avhz .. h=humanoid, v=verbose, a=archive, z=compression .. archive 指示它维护 time_t 值,所以即使时钟不在 rsync 知道每个文件的真实日期
-P is short for --partial --progress. --partial tells rsync to keep partially transferred files (and upon resume rsync will use partially transferred files always after checksumming safely)
-P 是 --partial --progress 的缩写。--partial 告诉 rsync 保留部分传输的文件(并且在恢复时 rsync 将始终在安全校验和后使用部分传输的文件)
From man pages: http://ss64.com/bash/rsync_options.html
来自手册页:http: //ss64.com/bash/rsync_options.html
--partial
By default, rsync will delete any partially transferred file if the transfer
is interrupted. In some circumstances it is more desirable to keep partially
transferred files. Using the --partial option tells rsync to keep the partial
file which should make a subsequent transfer of the rest of the file much faster.
--progress
This option tells rsync to print information showing the progress of the transfer.
This gives a bored user something to watch.
This option is normally combined with -v. Using this option without the -v option
will produce weird results on your display.
-P
The -P option is equivalent to --partial --progress.
I found myself typing that combination quite often so I created an option to make
it easier.
NOTE: for a connection which is interrupted multiple times:If you need to resume after rsync (after the connection is interrupted) then it is best to rename the temporary file on destination. scp creates a file on destination with same name as final file. If scp is interrupted this file is a truncated version of the file. An rsync (-avzhP) will resume from that file but start writing to a temporary file name like ..Yhg7al.
注意:对于多次中断的连接:如果您需要在 rsync 后恢复(连接中断后),那么最好重命名目标上的临时文件。scp 在目标上创建一个与最终文件同名的文件。如果 scp 中断,则此文件是该文件的截断版本。rsync (-avzhP) 将从该文件恢复,但开始写入临时文件名,如 ..Yhg7al。
Procedure when starting with scp:
使用 scp 启动时的过程:
scp; *interrupt*; rsync; [REPEAT_as_needed: *interrupt*; mv .destfile.tmpzhX destfile; rsync;].
Procedure when starting with rsync:
使用 rsync 启动时的过程:
rsync; [REPEAT_as_needed: *interrupt*; mv .destfile.tmpzhX destfile; rsync;].