Linux 复制 1TB 稀疏文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13252682/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 17:43:05  来源:igfitidea点击:

Copying a 1TB sparse file

linuxfilesparse-file

提问by ericzma

I got a sparse file of 1TB which stores actually 32MB data on Linux.

我得到了一个 1TB 的稀疏文件,它在 Linux 上实际存储了 32MB 的数据。

Is it possible to "efficiently" make a package to store the sparse file? The package should be unpacked to be a 1TB sparse file on another computer. Ideally, the "package" should be around 32MB.

是否可以“有效地”制作一个包来存储稀疏文件?该包应在另一台计算机上解压为 1TB 稀疏文件。理想情况下,“包”应该在 32MB 左右。

Note: On possible solution is to use 'tar': https://wiki.archlinux.org/index.php/Sparse_file#Archiving_with_.60tar.27

注意:可能的解决方案是使用“tar”:https: //wiki.archlinux.org/index.php/Sparse_file#Archiving_with_.60tar.27

However, for a 1TB sparse file, although the tar ball may be small, archiving the sparse file will take too long a time.

但是,对于1TB的稀疏文件,虽然tar包可能很小,但是归档稀疏文件会花费很长时间。

Edit 1

编辑 1

I tested the tar and gzip and the results are as follows (Note that this sparse file contains data of 0 byte).

我测试了tar和gzip,结果如下(注意这个稀疏文件包含0字节的数据)。

$ du -hs sparse-1
0   sparse-1

$ ls -lha sparse-1
-rw-rw-r-- 1 user1 user1 1.0T 2012-11-03 11:17 sparse-1

$ time tar cSf sparse-1.tar sparse-1

real    96m19.847s
user    22m3.314s
sys     52m32.272s

$ time gzip sparse-1

real    200m18.714s
user    164m33.835s
sys     10m39.971s

$ ls -lha sparse-1*
-rw-rw-r-- 1 user1 user1 1018M 2012-11-03 11:17 sparse-1.gz
-rw-rw-r-- 1 user1 user1   10K 2012-11-06 23:13 sparse-1.tar

The 1TB file sparse-1 which contains 0 byte data can be archived by 'tar' to a 10KB tar ball or compressed by gzip to a ~1GB file. gzip takes around 2 times of the time than the time tar uses.

包含 0 字节数据的 1TB 文件 sparse-1 可以通过“tar”存档到 10KB tar 球或通过 gzip 压缩到 ~1GB 文件。gzip 花费的时间大约是 tar 使用时间的 2 倍。

From the comparison, 'tar' seems better than gzip.

从比较来看,'tar' 似乎比 gzip 好。

However, 96 minutes are too long for a sparse file that contains data of 0 byte.

但是,96 分钟对于包含 0 字节数据的稀疏文件来说太长了。

Edit 2

编辑 2

rsyncseems finish copying the file in more time than tarbut less than gzip:

rsync似乎完成复制文件的时间多于tar但少于gzip

$ time rsync --sparse sparse-1 sparse-1-copy

real    124m46.321s
user    107m15.084s
sys     83m8.323s

$ du -hs sparse-1-copy 
4.0K    sparse-1-copy

Hence, tar+ cpor scpshould be faster than directly rsyncfor this extremely sparse file.

因此,对于这个极其稀疏的文件,tar+ cporscp应该比直接使用更快rsync

Edit 3

编辑 3

Thanks to @mvp for pointing out the SEEK_HOLE functionality in newer kernel. (I previously work on a 2.6.32 Linux kernel).

感谢@mvp 指出新内核中的 SEEK_HOLE 功能。(我以前在 2.6.32 Linux 内核上工作)。

Note: bsdtar version >=3.0.4 is required (check here: http://ask.fclose.com/4/how-to-efficiently-archive-a-very-large-sparse-file?show=299#c299).

注意:需要 bsdtar 版本 >=3.0.4(查看这里:http://ask.fclose.com/4/how-to-efficiently-archive-a-very-large-sparse-file?show=299#c299 )。

On a newer kernel and Fedora release (17), tarand cphandles the sparse file veryefficiently.

在新的内核和Fedora发行版(17),tar以及cp把手稀疏文件非常有效。

[zma@office tmp]$ ls -lh pmem-1 

-rw-rw-r-- 1 zma zma 1.0T Nov  7 20:14 pmem-1
[zma@office tmp]$ time tar cSf pmem-1.tar pmem-1

real    0m0.003s
user    0m0.003s
sys 0m0.000s
[zma@office tmp]$ time cp pmem-1 pmem-1-copy

real    0m0.020s
user    0m0.000s
sys 0m0.003s
[zma@office tmp]$ ls -lh pmem*
-rw-rw-r-- 1 zma zma 1.0T Nov  7 20:14 pmem-1
-rw-rw-r-- 1 zma zma 1.0T Nov  7 20:15 pmem-1-copy
-rw-rw-r-- 1 zma zma  10K Nov  7 20:15 pmem-1.tar
[zma@office tmp]$ mkdir t
[zma@office tmp]$ cd t
[zma@office t]$ time tar xSf ../pmem-1.tar 

real    0m0.003s
user    0m0.000s
sys 0m0.002s
[zma@office t]$ ls -lha
total 8.0K
drwxrwxr-x   2 zma  zma  4.0K Nov  7 20:16 .
drwxrwxrwt. 35 root root 4.0K Nov  7 20:16 ..
-rw-rw-r--   1 zma  zma  1.0T Nov  7 20:14 pmem-1

I am using a 3.6.5 kernel:

我使用的是 3.6.5 内核:

[zma@office t]$ uname -a
Linux office.zhiqiangma.com 3.6.5-1.fc17.x86_64 #1 SMP Wed Oct 31 19:37:18 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

采纳答案by mvp

Short answer:Use bsdtaror GNU tar(version 1.29 or later) to create archives, and GNU tar(version 1.26 or later) to extract them on another box.

简短回答:使用bsdtar或 GNU tar(版本 1.29 或更高版本)创建档案,使用 GNU tar(版本 1.26 或更高版本)将它们提取到另一个盒子上。

Long answer:There are some requirements for this to work.

长答案:有一些要求才能使其工作。

First, Linux must be at least kernel 3.1 (Ubuntu 12.04 or later would do), so it supports SEEK_HOLEfunctionality.

首先,Linux 必须至少是内核 3.1(Ubuntu 12.04 或更高版本可以),因此它支持SEEK_HOLE功能。

Then, you need tar utility that can support this syscall. GNU tarsupports it since version 1.29 (released on 2016/05/16, it should be present by default since Ubuntu 18.04), or bsdtarsince version 3.0.4 (available since Ubuntu 12.04) - install it using sudo apt-get install bsdtar.

然后,您需要可以支持此系统调用的 tar 实用程序。GNUtar自 1.29 版(于 2016 年 5 月 16 日发布,自 Ubuntu 18.04 起应默认存在)或bsdtar自 3.0.4 版(自 Ubuntu 12.04 起可用)以来支持它 - 使用sudo apt-get install bsdtar.

While bsdtar(which uses libarchive) is awesome, unfortunately, it is not very smart when it comes to untarring - it stupidly requires to have at least as much free space on target drive as untarred file size, without regard to holes. GNU tarwill untar such sparse archives efficiently and will not check this condition.

虽然bsdtar(使用libarchive)很棒,但不幸的是,它在解压缩时并不是很聪明 - 它愚蠢地要求目标驱动器上至少有与解压缩文件大小一样多的可用空间,而不考虑漏洞。GNUtar将有效地解压这种稀疏档案,并且不会检查这种情况。

This is log from Ubuntu 12.10 (Linux kernel 3.5):

这是来自 Ubuntu 12.10(Linux 内核 3.5)的日志:

$ dd if=/dev/zero of=1tb seek=1T bs=1 count=1
1+0 records in
1+0 records out
1 byte (1 B) copied, 0.000143113 s, 7.0 kB/s

$ time bsdtar cvfz sparse.tar.gz 1tb 
a 1tb

real    0m0.362s
user    0m0.336s
sys 0m0.020s

# Or, use gnu tar if version is later than 1.29:
$ time tar cSvfz sparse-gnutar.tar.gz 1tb
1tb

real    0m0.005s
user    0m0.006s
sys 0m0.000s

$ ls -l
-rw-rw-r-- 1 autouser autouser 1099511627777 Nov  7 01:43 1tb
-rw-rw-r-- 1 autouser autouser           257 Nov  7 01:43 sparse.tar.gz
-rw-rw-r-- 1 autouser autouser           134 Nov  7 01:43 sparse-gnutar.tar.gz
$

Like I said above, unfortunately, untarring with bsdtarwill not work unless you have 1TB free space. However, any version of GNU tarworks just fine to untar such sparse.tar:

就像我上面说的,不幸的是,bsdtar除非您有 1TB 的可用空间,否则解压缩将不起作用。但是,任何版本的 GNU 都tar可以很好地解压这样的sparse.tar

$ rm 1tb 
$ time tar -xvSf sparse.tar.gz 
1tb

real    0m0.031s
user    0m0.016s
sys 0m0.016s
$ ls -l
total 8
-rw-rw-r-- 1 autouser autouser 1099511627777 Nov  7 01:43 1tb
-rw-rw-r-- 1 autouser autouser           257 Nov  7 01:43 sparse.tar.gz

回答by LukeGT

You're definitely looking for a compression tool such as tar, lzma, bzip2, zipor rar. According to this site, lzmais quite fast while still having quite a good compression ratio:

你肯定找一个压缩工具,如tarlzmabzip2ziprar。根据此站点,lzma速度相当快,同时仍然具有相当好的压缩比:

http://blog.terzza.com/linux-compression-comparison-gzip-vs-bzip2-vs-lzma-vs-zip-vs-compress/

http://blog.terzza.com/linux-compression-comparison-gzip-vs-bzip2-vs-lzma-vs-zip-vs-compress/

You can also adjust the speed/quality ratio of the compression by setting the compression level to something low, experiment a bit to find a level that works best

您还可以通过将压缩级别设置为低来调整压缩的速度/质量比,尝试一下以找到最有效的级别

http://linux.die.net/man/1/unlzma

http://linux.die.net/man/1/unlzma

回答by wallyk

From a related question, maybe rsyncwill work:

一个相关的问题,也许rsync会奏效:

rsync --sparse sparse-1 sparse-1-copy

回答by Askeli

I realize this question is very old, but here's an update that may be helpful to others who find their way here the same way I did.

我意识到这个问题已经很老了,但这里有一个更新,可能对那些像我一样在这里找到方法的人有所帮助。

Thankfully, mvp's excellent answer is now obsolete. According to the GNU tar release notes, SEEK_HOLE/SEEK_DATA was added in v. 1.29, released 2016-05-16. (And with GNU tar v. 1.30 being standard in Debian stable now, it's safe to assume that tar version ≥ 1.29 is available almost everywhere.)

值得庆幸的是,mvp 的优秀答案现在已经过时了。根据 GNU tar 发行说明,SEEK_HOLE/SEEK_DATA 是在 2016 年 5 月 16 日发布的 1.29 版中添加的。(由于 GNU tar v. 1.30 现在是 Debian 稳定版的标准,可以安全地假设 tar 版本 ≥ 1.29 几乎在任何地方都可用。)

So the way to handle sparse files now is to archive them with whichever tar (GNU or BSD) is installed on your system, and same for extracting.

因此,现在处理稀疏文件的方法是使用系统上安装的任何 tar(GNU 或 BSD)将它们存档,提取时也是如此。

Additionally, for sparse files that actually contain some data, if it's worthwhile to use compression (ie the data is compressible enough to save substantial disk space, and the disk space savings are worth the likely-substantial time and CPU resources required to compress it):

此外,对于实际包含一些数据的稀疏文件,如果值得使用压缩(即数据可压缩到足以节省大量磁盘空间,并且节省的磁盘空间值得花费大量时间和 CPU 资源来压缩它) :

  • tar -cSjf <archive>.tar.bz2 /path/to/sparse/filewill both take advantage of tar's SEEK_HOLE functionality to quickly & efficiently archive the sparse file, and use bzip2 to compress the actual data.
  • tar --use-compress-program=pbzip2 -cSf <archive>.tar.bz2 /path/to/sparse/file, as alluded to in marcin's comment, will do the same while alsousing multiple cores for the compression task.
  • tar -cSjf <archive>.tar.bz2 /path/to/sparse/file将利用 tar 的 SEEK_HOLE 功能快速有效地存档稀疏文件,并使用 bzip2 压缩实际数据。
  • tar --use-compress-program=pbzip2 -cSf <archive>.tar.bz2 /path/to/sparse/file,在马辛的评论暗示,会同时做同样使用多个内核的压缩任务。

On my little home server with a quad-core Atom CPU, using pbzip2vs bzip2reduced the time by around 25 or 30%.

在我的带有四核 Atom CPU 的小型家用服务器上,使用pbzip2vsbzip2将时间减少了大约 25% 或 30%。

With or without compression, this will give you an archive that doesn't need any special sparse-file handling, takes up approximately the 'real' size of the original sparse file (or less if compressed), and can be moved around without worrying about inconsistency between different utilities' sparse file capabilities. For example: cpwill automatically detect sparse files and do the right thing, rsyncwill handle sparse files properly if you use the -Sflag, and scphas no option for sparse files (it will consume bandwidth copying zeros for all the holes and the resulting copy will be a non-sparse file whose size is the 'apparent' size of the original); but all of them will of course handle a tar archive just fine—whether it contains sparse files or not—without any special flags.

有或没有压缩,这将为您提供一个不需要任何特殊稀疏文件处理的存档,大约占用原始稀疏文件的“真实”大小(如果压缩则更少),并且可以随意移动而无需担心关于不同实用程序的稀疏文件功能之间的不一致。例如:cp将自动检测稀疏文件并做正确的事情,rsync如果您使用该-S标志将正确处理稀疏文件,并且scp没有稀疏文件选项(它将消耗带宽复制所有孔的零,结果副本将是一个非稀疏文件,其大小是原始文件的“表观”大小);但是它们当然可以很好地处理 tar 存档——无论它是否包含稀疏文件——没有任何特殊标志。

Additional Notes

补充说明

  1. When extracting, tarwill automatically detect an archive created with -Sso there's no need to specify it.
  2. An archive created with pbzip2is stored in chunks. This results in the archive being marginally bigger than if bzip2is used, but also means that the extraction can be multithreaded, unlike an archive created with bzip2.
  3. pbzip2and bzip2will reliably extract each other's archives without error or corruption.
  1. 提取时,tar将自动检测创建的​​存档,-S因此无需指定它。
  2. 使用创建的存档pbzip2存储在块中。这导致存档比使用 if 稍大bzip2,但也意味着提取可以是多线程的,这与使用bzip2.
  3. pbzip2并且bzip2将可靠地提取彼此的档案而不会出现错误或损坏。