Linux /dev/zero 或 /dev/random - 哪个更安全,为什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11499409/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 13:48:48  来源:igfitidea点击:

/dev/zero or /dev/random - what is more secure and why?

linuxrandomzero

提问by skater_nex

Can anyone tell me why is /dev/random is preferred for security while wiping data from a hard drive?

谁能告诉我为什么 /dev/random 是从硬盘驱动器擦除数据时首选的安全性?

采纳答案by OmnipotentEntity

Simple answer, /dev/randomis not preferred. Both are equally secure. Use /dev/zerofor easier verification. Also less CPU usage and possibly faster.

简单的回答,/dev/random不是首选。两者同样安全。使用/dev/zero更容易验证。也更少的 CPU 使用和可能更快。

More complete answer. For modern hard drives platter density is such that it's impossible to obtain signals from incompletely overwritten sectors of the drive, that people such as Gutmann wrote about many, many years ago. As far as modern hard drives are concerned (I'd place this as any hard drive whose capacity can be measured in Gigabytes or better), if it's overwritten it's gone. End of story. So it doesn't matter what you change the data to. Just that you change the data.

更完整的答案。对于现代硬盘驱动器,盘片密度如此之大,以至于无法从驱动器的未完全覆盖扇区中获取信号,很多很多年前,Gutmann 等人就曾写道。就现代硬盘驱动器而言(我将其放置为容量可以以千兆字节或更好为单位测量的任何硬盘驱动器),如果它被覆盖,它就会消失。故事结局。因此,您将数据更改为什么并不重要。只是你改变了数据。

To add onto this, even if you wipe a hard drive completely, there may still be data left on the drive in sectors that were remapped by the hard drive's firmware but these are relatively rare, and only a very small amount of data would be contained within, not to mention that you'd need very specialized equipment to retrieve that data (you'd have to edit the G-Listwithin the System Areaof the drive to get at it), not to mention that the reason why those sectors were remapped in the first place is because they were failing.

此外,即使您完全擦除硬盘驱动器,硬盘驱动器固件重新映射的扇区中仍可能会保留数据,但这些数据相对较少,并且只会包含非常少量的数据内,更何况你需要非常专业的设备来获取数据(你必须编辑G-List中的System Area驱动器在它来获得),更不要说为什么这些部门在第一被重新映射的原因地方是因为他们失败了。

So to sum up, DoD wipes are stupid, Gutmann wipes are stupider, use /dev/zero, it's good in nearly 100% of all cases. And if it's an edge case then you need to have very specialized know how to get at the data and also remove the data.

所以总而言之,DoD 湿巾很笨,Gutmann 湿巾更笨,使用/dev/zero,几乎 100% 的情况下都很好。如果这是一个边缘情况,那么您需要非常专业的知道如何获取数据并删除数据。

"thanks! so, what about usb stick?"

“谢谢!那么,U盘呢?”

USB stick is a different animal altogether, you'd need to bypass the flash controller in order to clean it out, even a Gutmann wipe won't completely remove the data because of wear leveling algorithms. But just like a hard drive, if you overwrite the data once, it's gone, the trick is forcing the device to actually overwrite the data.

USB 记忆棒完全不同,您需要绕过闪存控制器才能将其清除,即使 Gutmann 擦拭也不会完全删除数据,因为磨损均衡算法。但是就像硬盘驱动器一样,如果您覆盖数据一次,它就会消失,诀窍是强制设备实际覆盖数据。

That being said, if you have a cheap USB stick without a controller which does wear leveling then a single pass 0-fill should be sufficient to remove the data within. Otherwise, you're looking at custom hardware and soldering work.

话虽如此,如果您有一个没有磨损均衡控制器的廉价 USB 记忆棒,那么一次通过 0-fill 应该足以删除其中的数据。否则,您正在查看定制硬件和焊接工作。

SSDs should be considered USB sticks with a controller that performs wear leveling. SSDs will always do wear leveling, I do not know of any exceptions to this rule. Many USB sticks do not.

SSD 应被视为带有执行磨损均衡的控制器的 USB 记忆棒。SSD 将始终进行磨损均衡,我不知道此规则有任何例外。许多 USB 记忆棒没有。

How do you tell if a USB stick does wear leveling? You need to take it apart and inspect the controller chip and look up a datasheet on it.

如何判断 U 盘是否磨损均衡?您需要将其拆开并检查控制器芯片并查看其上的数据表。

"Would you give a source for the statement that it is "impossible to obtain signals from incompletely overwritten sectors of the drive" ? I am not talking about tests from computer magazines concerning data recovery stores, I am talking of the worst case scenario: a well-equipped government laboratory. So I really would like to know how can you guarantee that statement, preferably a scientific paper."

“您能否提供“无法从驱动器未完全覆盖的扇区获取信号”这一说法的来源?我不是在谈论计算机杂志中有关数据恢复存储的测试,我在谈论最坏的情况: ”

I'll give some justification and information regarding the analog storage of digital data on magnetic media. The following is mostly things that I was taught while on the job at a data recovery company, and may partially inaccurate in places. If so, let me know, I will correct it. But this is my best understanding of the material.

我将给出一些关于在磁介质上模拟存储数字数据的理由和信息。以下主要是我在数据恢复公司工作时学到的东西,可能有部分不准确的地方。如果是这样,请告诉我,我会更正。但这是我对材料的最佳理解。

After a hard drive is manufactured the first thing that happens is it receives servo labels from a servo label writing machine. This is a separate machine whose sole job is to take a completely blank hard drive and bootstrap it. (This is why hard drives have holes in them covered with aluminum tape, that's where the servo labeling machine places its write heads.) If you've ever had a drive that when you powered it on it just generated "click click click" it's is because it could not read the servo labels. When a hard drive is powered on the first thing it tries to do is fling its read heads somewhere onto the platter and acquire a track. Servo labels define tracks. If it can't see a servo label it reaches the middle, makes a clack, pulls the arm back and tries again.

硬盘驱动器制造完成后,首先会从伺服标签写入机接收伺服标签。这是一台单独的机器,其唯一的工作是获取一个完全空白的硬盘驱动器并对其进行引导。(这就是为什么硬盘上有孔,上面覆盖着铝带,这是伺服贴标机放置写入头的地方。)是因为它无法读取伺服标签。当硬盘驱动器通电时,它尝试做的第一件事就是将读取磁头扔到盘片上的某个地方并获取磁道。伺服标签定义轨道。如果它看不到伺服标签,它会到达中间,发出咔嗒声,将手臂向后拉,然后再试一次。

The reason why I mention this, is that is pretty much the only instance that an external device reads and writes to the hard drive and it describes approximately the limit that hardware outside of that drives own read heads can work with the data on a platter. If it were possible to make servo labels smaller and more space efficient hard drive manufacturers would. Servo labels are comparatively space inefficient for two reasons.

我之所以提到这一点,是因为这几乎是外部设备读取和写入硬盘驱动器的唯一实例,它大致描述了该驱动器之外的硬件自身的读头可以处理盘片上的数据的限制。如果有可能使伺服标签更小、更节省空间,硬盘制造商会这样做。伺服标签的空间效率相对较低,原因有两个。

  1. It is absolutely critical that they do not fail. If a servo label fails then every time the head goes over that particular servo label it will lose track, this pragmatically means that the entire track is unusable.
  2. It places some idea of how much better hard drive hardware is at dealing with information on platters than external machinery.
  1. 他们不会失败是绝对关键的。如果伺服标签出现故障,则每次磁头越过该特定伺服标签时,它都会丢失磁道,这实际上意味着整个磁道无法使用。
  2. 它提出了一些想法,即硬盘驱动器硬件在处理盘片信息方面比外部机器好得多。

A ring of servo labels defines a track. There are some things you must know about tracks.

一圈伺服标签定义了一个轨道。关于轨道,您必须了解一些事情。

  1. They are not necessarily circular. They are imperfect and can contain warps. This is because the servo label machine is not accurate.
  2. They are not necessarily concentric. They can and do cross. This means that certain sectors or whole tracks can be unusable just because the servo label machine is inaccurate.
  1. 它们不一定是圆形的。它们是不完美的,可能包含扭曲。这是因为伺服标签机不准确。
  2. 它们不一定是同心的。他们可以并且确实交叉。这意味着某些扇区或整个磁道可能因为伺服标签机不准确而无法使用。

After servo labels are written, then comes the low level format. An actual low level 1980s format of a drive, except more complicated. Because platters are circular but hard drive speeds are constant the amount of area passing under the read head is a variable function of the distance to the middle of the platter. So, in an effort to squeeze every last drop of storage out of a platter the density of the platter is variable and defined in zones. On a typical 3.5" hard drive there will be several dozen zones with different platter densities.

写入伺服标签后,接下来是低级格式。驱动器的实际低级 1980 格式,除了更复杂。由于盘片是圆形的但硬盘驱动器速度是恒定的,因此读取头下方通过的区域量是到盘片中间距离的可变函数。因此,为了从盘片中挤出最后一滴存储,盘片的密度是可变的,并在区域中定义。在一个典型的 3.5" 硬盘上,会有几十个不同盘片密度的区域。

One of which is special and extra low density called the System Area. The System Area is where all of the firmware and configuration settings are stored on the drive. It has an extra low density because that information is more important. The lower the density the less chance there is that something will randomly screw up. It happens all of the time of course, but less often than something in the user area.

其中之一是特殊的超低密度,称为System Area. 系统区是所有固件和配置设置存储在驱动器上的地方。它具有特别低的密度,因为该信息更重要。密度越低,随机出错的可能性就越小。当然,它一直在发生,但比用户区域中的发生频率要低。

After the drive is low level formatted the firmware is written to the System Area. The firmware is different for every drive. In order to optimize the drive for the ridiculously fine requirements of the platters, each drive must be tuned. (This actually takes place before the low level format, of course, because you have to know how good the equipment is in order to decide how dense to make the platters.) This data is known as adaptivesand is saved in the System Area. Information in the adaptives area is stuff like "how much voltage should I use to correct myself when the servo labels tell me I'm drifting off track", and other information required to make the hard drive actually work. If the adaptives are off slightly it might be impossible to access the user area. The system area is easier to access, so only very few adaptives are required to be stored on the PCB CMOS.

在驱动器被低级格式化后,固件被写入系统区。每个驱动器的固件都不同。为了针对盘片极其精细的要求优化驱动器,必须对每个驱动器进行调整。(当然,这实际上发生在低级格式化之前,因为您必须知道设备有多好才能决定制作盘片的密度。)此数据称为adaptives并保存在系统区中。自适应区域中的信息是诸如“当伺服标签告诉我偏离轨道时我应该使用多少电压来纠正自己”以及使硬盘驱动器实际工作所需的其他信息。如果适应性稍微关闭,则可能无法访问用户区域。系统区更容易访问,因此只需要在 PCB CMOS 上存储很少的自适应。

Take aways from this paragraph:

摘自本段:

  1. Lower density means easier to read.
  2. The higher the density the more likely it is for things to randomly screw up.
  3. The user area has as high a density as the hard drive manufacturer can possibly make it.
  4. If this seems slamdash and slipshod, that's because it really is. Hard drive manufacturers compete and win on price per GB. Hard drive design isn't really about making very carefully manufactured pieces of equipment and putting them together very carefully, because that simply isn't enough anymore. Sure, they still do do that, but they also have to make the pieces work together with each other in software because the hardware tolerances are too broad to be competitive anymore.
  1. 较低的密度意味着更容易阅读。
  2. 密度越高,事情随机搞砸的可能性就越大。
  3. 用户区的密度与硬盘驱动器制造商所能达到的密度一样高。
  4. 如果这看起来很草率和马虎,那是因为它确实如此。硬盘驱动器制造商在每 GB 价格上展开竞争并取胜。硬盘驱动器设计并不是真正地制造非常精心制造的设备并将它们非常小心地组装在一起,因为这已经远远不够了。当然,他们仍然这样做,但他们还必须让各个部分在软件中协同工作,因为硬件容差太大,不再具有竞争力。

So. Because the user are has such a high density, it actually is very (very (very very)) likely to get screwed up bits in the normal course of things. This can be caused by many, many factors including very slight timing issues and platter degradation. A good percentage of sectors of your hard drive actually contain screwed up bits. (You can verify this yourself by issuing an ATA28 READLONGcommand to your drive (only valid for the first 127 GB or so. There is no ATA48equivalent it was dropped!) several times on many sectors and comparing the output. You'll find that it isn't a rare occurrence that certain bits will misbehave and act suck on or off or even flip randomly.) It's a fact of life. Which is why we have ECC.

所以。由于用户具有如此高的密度,因此在正常情况下实际上非常(非常(非常))可能会搞砸一些东西。这可能是由许多因素引起的,包括非常轻微的时序问题和盘片退化。硬盘驱动器的很大一部分扇区实际上包含了一些乱七八糟的位。(您可以通过向ATA28 READLONG您的驱动器发出命令来自己验证这一点(仅对前 127 GB 左右有效。没有ATA48等效的它被删除!)在许多扇区上多次并比较输出。您会发现它不是某些位会行为不端并表现得很糟糕,甚至随机翻转,这种情况并不罕见。)这是生活中的事实。这就是为什么我们有ECC.

ECC is a checksum contained after the 512 (or 4096 in newer drives) bytes of data that will correct that data if it has few enough incorrect bits. The exact number depends on firmware and manufacturer, but all drives have it and all drives need it (and it's surprisingly higher than you'd expect, something like 48-60 bytes that can detect and correct up to 6-8 error bytes. Crazy math going on.) This is because the density of the platters is too high for even the highly specialized and tuned internal hard drive equipment.

ECC 是包含在 512(或在较新的驱动器中为 4096)字节数据之后的校验和,如果该数据具有足够少的不正确位,它将纠正该数据。确切的数字取决于固件和制造商,但所有驱动器都有它并且所有驱动器都需要它(并且它比您预期的要高得多,例如 48-60 字节,可以检测和纠正多达 6-8 个错误字节。疯狂数学还在继续。)这是因为即使是高度专业化和调整过的内部硬盘驱动器设备,盘片的密度也太高了。

Finally, I want to talk about the preamp chip. It's located on the arm of the hard drive and acts as a megaphone. Because the signals are being generated from very small magnetic fields, acting on very small heads they have a very small potential. So you cannot use the hard drive head for the Gutmann method, because you cannot get an accurate enough reading from it to make Gutmann's technique worthwhile.

最后,我想谈谈前置放大器芯片。它位于硬盘驱动器的臂上,用作扩音器。因为信号是由非常小的磁场产生的,作用在非常小的磁头上,所以它们的电势非常小。因此,您不能将硬盘驱动器磁头用于 Gutmann 方法,因为您无法从中获得足够准确的读数,从而使 Gutmann 的技术变得有价值。

But let's posit that the NSA has a piece of magic equipment, and they can get a very accurate read (accurate enough to calculate the potential and derive the previously written data) of any particular bit in 1 ms. What do they need first?

但是让我们假设 NSA 有一个魔法设备,他们可以在 1 毫秒内非常准确地读取任何特定位(准确到足以计算电位并导出先前写入的数据)。他们首先需要什么?

First, they need the System Area. Because that's where the Translator is stored (the translator is the things that turns an LBA address into a PCHS address (Physical Cylinder Head Sector as opposed to the logical CHS address which is fake and only around for legacy reasons). The size of the System Area varies, and you can get it without resorting to magic tools. Normally, it's only around 50-100MB. The layout of the translator is firmware specific, so you have to reverse it (but it's been done before, no big deal.)

首先,他们需要系统区。因为那是存储转换器的地方(转换器是将 LBA 地址转换为 PCHS 地址的东西(物理圆柱头扇区,而不是逻辑 CHS 地址,逻辑 CHS 地址是假的并且仅出于遗留原因)。系统的大小面积因人而异,不用魔法工具也能搞定,一般情况下只有50-100MB左右。翻译器的布局是固件特定的,所以你必须逆向它(但之前已经做了,没什么大不了的。)

So first problem, signal to noise. As mentioned, platter density is tuned way higher that is strictly safe. Gutmann's method requires a very low variance in normal read/write activity to calculate previous states of the bits with any accuracy. If the variance in signal is significant then it can screw over these attempts. And the variance is significant enough to completely screw you over (that's why ECC is so crazy in modern drives.) An analogy would be like trying to perfectly hear someone whispering to you while someone is talking to you in the middle of a noisy room.

所以第一个问题,信噪比。如前所述,盘片密度调得更高,这是绝对安全的。Gutmann 的方法需要在正常读/写活动中具有非常低的变化,才能以任何准确度计算位的先前状态。如果信号的差异很大,那么它可能会破坏这些尝试。并且差异大到足以让你彻底崩溃(这就是 ECC 在现代驱动器中如此疯狂的原因。)一个类比就像试图完美地听到有人在嘈杂的房间中间与你交谈时有人对你耳语。

Second problem, time. Even if the electron microscope is very fast and accurate (1ms per bit! That's lightning for an electron microscope. It's also slower than a 1200 baud modem), there is a LOT of data on a hard drive and a full image will take a very long time. (WA says 126 years for an entire 500GB hard drive, and that's NOT including ECC data (which you need). There's also lots of other metadata associated with hard drive sectors that I didn't mention, like ID fields, and Address markers, but these don't get overwritten, perhaps you can come up with a faster way to image them normally? Doubtless there are ways to speed up this process (such as selectively imaging portions of the drive) but even that will take you months of 24/7 around the clock work just to get the $MFTfile on a standard hard drive (typically around 50-300MB on a drive with Windows installed)).

第二个问题,时间。即使电子显微镜非常快速和准确(每比特 1 毫秒!这对电子显微镜来说是闪电。它也比 1200 波特调制解调器慢),硬盘驱动器上有很多数据,完整的图像需要很长时间很久。(WA 表示整个 500GB 硬盘驱动器为 126 年,这不包括 ECC 数据(您需要)。还有很多其他与硬盘驱动器扇区相关的元数据,我没有提到,例如 ID 字段和地址标记,但是这些不会被覆盖,也许您可​​以想出一种更快的方法来正常对其进行映像?毫无疑问,有一些方法可以加快此过程(例如选择性地对驱动器的某些部分进行映像),但即便如此,您也需要花费 24 个月的时间/7 日以继夜地工作,只为得到$MFT标准硬盘驱动器上的文件(通常在安装了 Windows 的驱动器上大约 50-300MB))。

Third problem, admissibility. If the government is after you they're after you for only a few reasons, they want to know something that you know, or they want to arrest you and put you in prison. There are easier ways to get the former (rubber hose cryptography), and the latter will require regular evidence procedures. Going back to the analogy, if someone testified that someone told them something in a whisper, while someone else was talking to them in the middle of a crowded and noisy room, there is a lot of room for doubt there. It would never be the sort of strong evidence that would want to spend lots of time and money.

第三个问题,可受理性。如果政府追捕你,他们追捕你只是出于几个原因,他们想知道你知道的事情,或者他们想逮捕你并把你关进监狱。有更简单的方法可以获得前者(橡胶软管密码术),而后者则需要定期取证程序。回到类比,如果有人作证说有人低声告诉他们某事,而其他人正在拥挤嘈杂的房间中与他们交谈,那么那里就有很大的怀疑空间。它永远不会是那种想要花费大量时间和金钱的有力证据。

回答by Gabe

You're asking the wrong question. Attempting to securely erase a drive by writing to user-visible blocks completely ignores the fact that there could be user data in sectors marked as bad (but which still contain readable sensitive data).

你问错了问题。试图通过写入用户可见的块来安全擦除驱动器完全忽略了这样一个事实,即在标记为坏的扇区(但仍包含可读敏感数据)中可能存在用户数据。

Of course it is possible to work around that by issuing ATA commands, but then a single ATA secure erase command will do everything you want in the first place. See https://ata.wiki.kernel.org/index.php/ATA_Secure_Erasefor details on how to use hdparmto issue the Secure Erase command with the --security-eraseoption.

当然,可以通过发出 ATA 命令来解决这个问题,但是一个 ATA 安全擦除命令将首先完成您想要的所有操作。有关如何使用该选项发出安全擦除命令的详细信息,请参阅https://ata.wiki.kernel.org/index.php/ATA_Secure_Erasehdparm--security-erase