Linux 复制 1TB 稀疏文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13252682/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Copying a 1TB sparse file
提问by ericzma
I got a sparse file of 1TB which stores actually 32MB data on Linux.
我得到了一个 1TB 的稀疏文件,它在 Linux 上实际存储了 32MB 的数据。
Is it possible to "efficiently" make a package to store the sparse file? The package should be unpacked to be a 1TB sparse file on another computer. Ideally, the "package" should be around 32MB.
是否可以“有效地”制作一个包来存储稀疏文件?该包应在另一台计算机上解压为 1TB 稀疏文件。理想情况下,“包”应该在 32MB 左右。
Note: On possible solution is to use 'tar': https://wiki.archlinux.org/index.php/Sparse_file#Archiving_with_.60tar.27
注意:可能的解决方案是使用“tar”:https: //wiki.archlinux.org/index.php/Sparse_file#Archiving_with_.60tar.27
However, for a 1TB sparse file, although the tar ball may be small, archiving the sparse file will take too long a time.
但是,对于1TB的稀疏文件,虽然tar包可能很小,但是归档稀疏文件会花费很长时间。
Edit 1
编辑 1
I tested the tar and gzip and the results are as follows (Note that this sparse file contains data of 0 byte).
我测试了tar和gzip,结果如下(注意这个稀疏文件包含0字节的数据)。
$ du -hs sparse-1
0 sparse-1
$ ls -lha sparse-1
-rw-rw-r-- 1 user1 user1 1.0T 2012-11-03 11:17 sparse-1
$ time tar cSf sparse-1.tar sparse-1
real 96m19.847s
user 22m3.314s
sys 52m32.272s
$ time gzip sparse-1
real 200m18.714s
user 164m33.835s
sys 10m39.971s
$ ls -lha sparse-1*
-rw-rw-r-- 1 user1 user1 1018M 2012-11-03 11:17 sparse-1.gz
-rw-rw-r-- 1 user1 user1 10K 2012-11-06 23:13 sparse-1.tar
The 1TB file sparse-1 which contains 0 byte data can be archived by 'tar' to a 10KB tar ball or compressed by gzip to a ~1GB file. gzip takes around 2 times of the time than the time tar uses.
包含 0 字节数据的 1TB 文件 sparse-1 可以通过“tar”存档到 10KB tar 球或通过 gzip 压缩到 ~1GB 文件。gzip 花费的时间大约是 tar 使用时间的 2 倍。
From the comparison, 'tar' seems better than gzip.
从比较来看,'tar' 似乎比 gzip 好。
However, 96 minutes are too long for a sparse file that contains data of 0 byte.
但是,96 分钟对于包含 0 字节数据的稀疏文件来说太长了。
Edit 2
编辑 2
rsync
seems finish copying the file in more time than tar
but less than gzip
:
rsync
似乎完成复制文件的时间多于tar
但少于gzip
:
$ time rsync --sparse sparse-1 sparse-1-copy
real 124m46.321s
user 107m15.084s
sys 83m8.323s
$ du -hs sparse-1-copy
4.0K sparse-1-copy
Hence, tar
+ cp
or scp
should be faster than directly rsync
for this extremely sparse file.
因此,对于这个极其稀疏的文件,tar
+ cp
orscp
应该比直接使用更快rsync
。
Edit 3
编辑 3
Thanks to @mvp for pointing out the SEEK_HOLE functionality in newer kernel. (I previously work on a 2.6.32 Linux kernel).
感谢@mvp 指出新内核中的 SEEK_HOLE 功能。(我以前在 2.6.32 Linux 内核上工作)。
Note: bsdtar version >=3.0.4 is required (check here: http://ask.fclose.com/4/how-to-efficiently-archive-a-very-large-sparse-file?show=299#c299).
注意:需要 bsdtar 版本 >=3.0.4(查看这里:http://ask.fclose.com/4/how-to-efficiently-archive-a-very-large-sparse-file?show=299#c299 )。
On a newer kernel and Fedora release (17), tar
and cp
handles the sparse file veryefficiently.
在新的内核和Fedora发行版(17),tar
以及cp
把手稀疏文件非常有效。
[zma@office tmp]$ ls -lh pmem-1
-rw-rw-r-- 1 zma zma 1.0T Nov 7 20:14 pmem-1
[zma@office tmp]$ time tar cSf pmem-1.tar pmem-1
real 0m0.003s
user 0m0.003s
sys 0m0.000s
[zma@office tmp]$ time cp pmem-1 pmem-1-copy
real 0m0.020s
user 0m0.000s
sys 0m0.003s
[zma@office tmp]$ ls -lh pmem*
-rw-rw-r-- 1 zma zma 1.0T Nov 7 20:14 pmem-1
-rw-rw-r-- 1 zma zma 1.0T Nov 7 20:15 pmem-1-copy
-rw-rw-r-- 1 zma zma 10K Nov 7 20:15 pmem-1.tar
[zma@office tmp]$ mkdir t
[zma@office tmp]$ cd t
[zma@office t]$ time tar xSf ../pmem-1.tar
real 0m0.003s
user 0m0.000s
sys 0m0.002s
[zma@office t]$ ls -lha
total 8.0K
drwxrwxr-x 2 zma zma 4.0K Nov 7 20:16 .
drwxrwxrwt. 35 root root 4.0K Nov 7 20:16 ..
-rw-rw-r-- 1 zma zma 1.0T Nov 7 20:14 pmem-1
I am using a 3.6.5 kernel:
我使用的是 3.6.5 内核:
[zma@office t]$ uname -a
Linux office.zhiqiangma.com 3.6.5-1.fc17.x86_64 #1 SMP Wed Oct 31 19:37:18 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
采纳答案by mvp
Short answer:Use bsdtar
or GNU tar
(version 1.29 or later) to create archives, and GNU tar
(version 1.26 or later) to extract them on another box.
简短回答:使用bsdtar
或 GNU tar
(版本 1.29 或更高版本)创建档案,使用 GNU tar
(版本 1.26 或更高版本)将它们提取到另一个盒子上。
Long answer:There are some requirements for this to work.
长答案:有一些要求才能使其工作。
First, Linux must be at least kernel 3.1 (Ubuntu 12.04 or later would do), so it supports SEEK_HOLE
functionality.
首先,Linux 必须至少是内核 3.1(Ubuntu 12.04 或更高版本可以),因此它支持SEEK_HOLE
功能。
Then, you need tar utility that can support this syscall. GNU tar
supports it since version 1.29 (released on 2016/05/16, it should be present by default since Ubuntu 18.04), or bsdtar
since version 3.0.4 (available since Ubuntu 12.04) - install it using sudo apt-get install bsdtar
.
然后,您需要可以支持此系统调用的 tar 实用程序。GNUtar
自 1.29 版(于 2016 年 5 月 16 日发布,自 Ubuntu 18.04 起应默认存在)或bsdtar
自 3.0.4 版(自 Ubuntu 12.04 起可用)以来支持它 - 使用sudo apt-get install bsdtar
.
While bsdtar
(which uses libarchive
) is awesome, unfortunately, it is not very smart when it comes to untarring - it stupidly requires to have at least as much free space on target drive as untarred file size, without regard to holes. GNU tar
will untar such sparse archives efficiently and will not check this condition.
虽然bsdtar
(使用libarchive
)很棒,但不幸的是,它在解压缩时并不是很聪明 - 它愚蠢地要求目标驱动器上至少有与解压缩文件大小一样多的可用空间,而不考虑漏洞。GNUtar
将有效地解压这种稀疏档案,并且不会检查这种情况。
This is log from Ubuntu 12.10 (Linux kernel 3.5):
这是来自 Ubuntu 12.10(Linux 内核 3.5)的日志:
$ dd if=/dev/zero of=1tb seek=1T bs=1 count=1
1+0 records in
1+0 records out
1 byte (1 B) copied, 0.000143113 s, 7.0 kB/s
$ time bsdtar cvfz sparse.tar.gz 1tb
a 1tb
real 0m0.362s
user 0m0.336s
sys 0m0.020s
# Or, use gnu tar if version is later than 1.29:
$ time tar cSvfz sparse-gnutar.tar.gz 1tb
1tb
real 0m0.005s
user 0m0.006s
sys 0m0.000s
$ ls -l
-rw-rw-r-- 1 autouser autouser 1099511627777 Nov 7 01:43 1tb
-rw-rw-r-- 1 autouser autouser 257 Nov 7 01:43 sparse.tar.gz
-rw-rw-r-- 1 autouser autouser 134 Nov 7 01:43 sparse-gnutar.tar.gz
$
Like I said above, unfortunately, untarring with bsdtar
will not work unless you have 1TB free space. However, any version of GNU tar
works just fine to untar such sparse.tar
:
就像我上面说的,不幸的是,bsdtar
除非您有 1TB 的可用空间,否则解压缩将不起作用。但是,任何版本的 GNU 都tar
可以很好地解压这样的sparse.tar
:
$ rm 1tb
$ time tar -xvSf sparse.tar.gz
1tb
real 0m0.031s
user 0m0.016s
sys 0m0.016s
$ ls -l
total 8
-rw-rw-r-- 1 autouser autouser 1099511627777 Nov 7 01:43 1tb
-rw-rw-r-- 1 autouser autouser 257 Nov 7 01:43 sparse.tar.gz
回答by LukeGT
You're definitely looking for a compression tool such as tar
, lzma
, bzip2
, zip
or rar
. According to this site, lzma
is quite fast while still having quite a good compression ratio:
你肯定找一个压缩工具,如tar
,lzma
,bzip2
,zip
或rar
。根据此站点,lzma
速度相当快,同时仍然具有相当好的压缩比:
http://blog.terzza.com/linux-compression-comparison-gzip-vs-bzip2-vs-lzma-vs-zip-vs-compress/
http://blog.terzza.com/linux-compression-comparison-gzip-vs-bzip2-vs-lzma-vs-zip-vs-compress/
You can also adjust the speed/quality ratio of the compression by setting the compression level to something low, experiment a bit to find a level that works best
您还可以通过将压缩级别设置为低来调整压缩的速度/质量比,尝试一下以找到最有效的级别
回答by wallyk
From a related question, maybe rsync
will work:
从一个相关的问题,也许rsync
会奏效:
rsync --sparse sparse-1 sparse-1-copy
回答by Askeli
I realize this question is very old, but here's an update that may be helpful to others who find their way here the same way I did.
我意识到这个问题已经很老了,但这里有一个更新,可能对那些像我一样在这里找到方法的人有所帮助。
Thankfully, mvp's excellent answer is now obsolete. According to the GNU tar release notes, SEEK_HOLE/SEEK_DATA was added in v. 1.29, released 2016-05-16. (And with GNU tar v. 1.30 being standard in Debian stable now, it's safe to assume that tar version ≥ 1.29 is available almost everywhere.)
值得庆幸的是,mvp 的优秀答案现在已经过时了。根据 GNU tar 发行说明,SEEK_HOLE/SEEK_DATA 是在 2016 年 5 月 16 日发布的 1.29 版中添加的。(由于 GNU tar v. 1.30 现在是 Debian 稳定版的标准,可以安全地假设 tar 版本 ≥ 1.29 几乎在任何地方都可用。)
So the way to handle sparse files now is to archive them with whichever tar (GNU or BSD) is installed on your system, and same for extracting.
因此,现在处理稀疏文件的方法是使用系统上安装的任何 tar(GNU 或 BSD)将它们存档,提取时也是如此。
Additionally, for sparse files that actually contain some data, if it's worthwhile to use compression (ie the data is compressible enough to save substantial disk space, and the disk space savings are worth the likely-substantial time and CPU resources required to compress it):
此外,对于实际包含一些数据的稀疏文件,如果值得使用压缩(即数据可压缩到足以节省大量磁盘空间,并且节省的磁盘空间值得花费大量时间和 CPU 资源来压缩它) :
tar -cSjf <archive>.tar.bz2 /path/to/sparse/file
will both take advantage of tar's SEEK_HOLE functionality to quickly & efficiently archive the sparse file, and use bzip2 to compress the actual data.tar --use-compress-program=pbzip2 -cSf <archive>.tar.bz2 /path/to/sparse/file
, as alluded to in marcin's comment, will do the same while alsousing multiple cores for the compression task.
tar -cSjf <archive>.tar.bz2 /path/to/sparse/file
将利用 tar 的 SEEK_HOLE 功能快速有效地存档稀疏文件,并使用 bzip2 压缩实际数据。tar --use-compress-program=pbzip2 -cSf <archive>.tar.bz2 /path/to/sparse/file
,在马辛的评论暗示,会同时做同样也使用多个内核的压缩任务。
On my little home server with a quad-core Atom CPU, using pbzip2
vs bzip2
reduced the time by around 25 or 30%.
在我的带有四核 Atom CPU 的小型家用服务器上,使用pbzip2
vsbzip2
将时间减少了大约 25% 或 30%。
With or without compression, this will give you an archive that doesn't need any special sparse-file handling, takes up approximately the 'real' size of the original sparse file (or less if compressed), and can be moved around without worrying about inconsistency between different utilities' sparse file capabilities. For example: cp
will automatically detect sparse files and do the right thing, rsync
will handle sparse files properly if you use the -S
flag, and scp
has no option for sparse files (it will consume bandwidth copying zeros for all the holes and the resulting copy will be a non-sparse file whose size is the 'apparent' size of the original); but all of them will of course handle a tar archive just fine—whether it contains sparse files or not—without any special flags.
有或没有压缩,这将为您提供一个不需要任何特殊稀疏文件处理的存档,大约占用原始稀疏文件的“真实”大小(如果压缩则更少),并且可以随意移动而无需担心关于不同实用程序的稀疏文件功能之间的不一致。例如:cp
将自动检测稀疏文件并做正确的事情,rsync
如果您使用该-S
标志将正确处理稀疏文件,并且scp
没有稀疏文件选项(它将消耗带宽复制所有孔的零,结果副本将是一个非稀疏文件,其大小是原始文件的“表观”大小);但是它们当然可以很好地处理 tar 存档——无论它是否包含稀疏文件——没有任何特殊标志。
Additional Notes
补充说明
- When extracting,
tar
will automatically detect an archive created with-S
so there's no need to specify it. - An archive created with
pbzip2
is stored in chunks. This results in the archive being marginally bigger than ifbzip2
is used, but also means that the extraction can be multithreaded, unlike an archive created withbzip2
. pbzip2
andbzip2
will reliably extract each other's archives without error or corruption.
- 提取时,
tar
将自动检测创建的存档,-S
因此无需指定它。 - 使用创建的存档
pbzip2
存储在块中。这导致存档比使用 if 稍大bzip2
,但也意味着提取可以是多线程的,这与使用bzip2
. pbzip2
并且bzip2
将可靠地提取彼此的档案而不会出现错误或损坏。