在 C# 中将大文件读入字节数组的最佳方法？

Question

提问by Tony_Henrich

I have a web server which will read large binary files (several megabytes) into byte arrays. The server could be reading several files at the same time (different page requests), so I am looking for the most optimized way for doing this without taxing the CPU too much. Is the code below good enough?

我有一个 Web 服务器，它将大型二进制文件（几兆字节）读入字节数组。服务器可能同时读取多个文件（不同的页面请求），因此我正在寻找最优化的方法来执行此操作，而不会对 CPU 造成过多负担。下面的代码够好吗？

public byte[] FileToByteArray(string fileName)
{
    byte[] buff = null;
    FileStream fs = new FileStream(fileName, 
                                   FileMode.Open, 
                                   FileAccess.Read);
    BinaryReader br = new BinaryReader(fs);
    long numBytes = new FileInfo(fileName).Length;
    buff = br.ReadBytes((int) numBytes);
    return buff;
}

Answer 1

采纳答案by Mehrdad Afshari

Simply replace the whole thing with:

只需将整个内容替换为：

return File.ReadAllBytes(fileName);

However, if you are concerned about the memory consumption, you should notread the whole file into memory all at once at all. You should do that in chunks.

但是，如果您担心内存消耗，则根本不应该将整个文件一次全部读入内存。你应该分块做。

Answer 2

回答by Powerlord

I would think this:

我会认为：

byte[] file = System.IO.File.ReadAllBytes(fileName);

Answer 3

回答by Powerlord

Your code can be factored to this (in lieu of File.ReadAllBytes):

您的代码可以考虑到这一点（代替 File.ReadAllBytes）：

public byte[] ReadAllBytes(string fileName)
{
    byte[] buffer = null;
    using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
    {
        buffer = new byte[fs.Length];
        fs.Read(buffer, 0, (int)fs.Length);
    }
    return buffer;
}

Note the Integer.MaxValue - file size limitation placed by the Read method. In other words you can only read a 2GB chunk at once.

请注意 Integer.MaxValue - Read 方法设置的文件大小限制。换句话说，您一次只能读取 2GB 的块。

Also note that the last argument to the FileStream is a buffer size.

还要注意 FileStream 的最后一个参数是缓冲区大小。

I would also suggest reading about FileStreamand BufferedStream.

我还建议阅读有关FileStream和BufferedStream 的内容。

As always a simple sample program to profile which is fastest will be most beneficial.

与往常一样，一个简单的示例程序来分析最快的将是最有益的。

Also your underlying hardware will have a large effect on performance. Are you using server based hard disk drives with large caches and a RAID card with onboard memory cache? Or are you using a standard drive connected to the IDE port?

此外，您的底层硬件将对性能产生很大影响。您是否使用基于服务器的具有大缓存的硬盘驱动器和带有板载内存缓存的 RAID 卡？或者您使用的是连接到 IDE 端口的标准驱动器？

Answer 4

回答by Todd Moses

Use the BufferedStream class in C# to improve performance. A buffer is a block of bytes in memory used to cache data, thereby reducing the number of calls to the operating system. Buffers improve read and write performance.

使用 C# 中的 BufferedStream 类来提高性能。缓冲区是内存中用于缓存数据的字节块，从而减少对操作系统的调用次数。缓冲区提高了读写性能。

See the following for a code example and additional explanation: http://msdn.microsoft.com/en-us/library/system.io.bufferedstream.aspx

请参阅以下代码示例和其他说明：http: //msdn.microsoft.com/en-us/library/system.io.bufferedstream.aspx

Answer 5

回答by Marc Gravell

I might argue that the answer here generallyis "don't". Unless you absolutely needall the data at once, consider using a Stream-based API (or some variant of reader / iterator). That is especiallyimportant when you have multiple parallel operations (as suggested by the question) to minimise system load and maximise throughput.

我可能会争辩说，这里的答案通常是“不要”。除非您一次绝对需要所有数据，否则请考虑使用Stream基于 API（或读取器/迭代器的某些变体）。当您有多个并行操作（如问题所建议的那样）以最小化系统负载和最大化吞吐量时，这一点尤其重要。

For example, if you are streaming data to a caller:

例如，如果您将数据流式传输到调用方：

Stream dest = ...
using(Stream source = File.OpenRead(path)) {
    byte[] buffer = new byte[2048];
    int bytesRead;
    while((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0) {
        dest.Write(buffer, 0, bytesRead);
    }
}

Answer 6

回答by Joel

Depending on the frequency of operations, the size of the files, and the number of files you're looking at, there are other performance issues to take into consideration. One thing to remember, is that each of your byte arrays will be released at the mercy of the garbage collector. If you're not caching any of that data, you could end up creating a lot of garbage and be losing most of your performance to % Time in GC. If the chunks are larger than 85K, you'll be allocating to the Large Object Heap(LOH) which will require a collection of all generations to free up (this is very expensive, and on a server will stop all execution while it's going on). Additionally, if you have a ton of objects on the LOH, you can end up with LOH fragmentation (the LOH is never compacted) which leads to poor performance and out of memory exceptions. You can recycle the process once you hit a certain point, but I don't know if that's a best practice.

根据操作频率、文件大小和您查看的文件数量，还有其他性能问题需要考虑。要记住的一件事是，您的每个字节数组都将在垃圾收集器的支配下被释放。如果您不缓存任何这些数据，则最终可能会产生大量垃圾，并将大部分性能损失到% Time in GC. 如果块大于 85K，您将分配给大对象堆（LOH），这将需要所有代的集合才能释放（这是非常昂贵的，并且在服务器上将在执行过程中停止所有执行））。此外，如果 LOH 上有大量对象，最终可能会出现 LOH 碎片（LOH 永远不会被压缩），这会导致性能不佳和内存不足异常。一旦达到某个点，您就可以回收该过程，但我不知道这是否是最佳实践。

The point is, you should consider the full life cycle of your app before necessarily just reading all the bytes into memory the fastest way possible or you might be trading short term performance for overall performance.

关键是，您应该先考虑应用程序的整个生命周期，然后才能以最快的方式将所有字节读入内存，否则您可能会为了整体性能而牺牲短期性能。

Answer 7

回答by Dave

I would recommend trying the Response.TransferFile()method then a Response.Flush()and Response.End()for serving your large files.

我建议您尝试使用该Response.TransferFile()方法然后 a Response.Flush()andResponse.End()为您的大文件提供服务。

Answer 8

回答by elaverick

If you're dealing with files above 2 GB, you'll find that the above methods fail.

如果您正在处理 2 GB 以上的文件，您会发现上述方法失败。

It's much easier just to hand the stream off to MD5and allow that to chunk your file for you:

将流交给MD5并允许它为您分块文件要容易得多：

private byte[] computeFileHash(string filename)
{
    MD5 md5 = MD5.Create();
    using (FileStream fs = new FileStream(filename, FileMode.Open))
    {
        byte[] hash = md5.ComputeHash(fs);
        return hash;
    }
}

Answer 9

回答by vapcguy

I'd say BinaryReaderis fine, but can be refactored to this, instead of all those lines of code for getting the length of the buffer:

我会说BinaryReader很好，但可以重构为这个，而不是所有那些用于获取缓冲区长度的代码行：

public byte[] FileToByteArray(string fileName)
{
    byte[] fileData = null;

    using (FileStream fs = File.OpenRead(fileName)) 
    { 
        using (BinaryReader binaryReader = new BinaryReader(fs))
        {
            fileData = binaryReader.ReadBytes((int)fs.Length); 
        }
    }
    return fileData;
}

Should be better than using .ReadAllBytes(), since I saw in the comments on the top response that includes .ReadAllBytes()that one of the commenters had problems with files > 600 MB, since a BinaryReaderis meant for this sort of thing. Also, putting it in a usingstatement ensures the FileStreamand BinaryReaderare closed and disposed.

应该比使用更好.ReadAllBytes()，因为我在顶级回复的评论中看到.ReadAllBytes()，其中一位评论者在文件大于 600 MB 时遇到问题，因为 aBinaryReader用于此类事情。此外，将其放在using声明中可确保FileStream和BinaryReader已关闭并已处理。

Answer 10

回答by Disha Sharma

use this:

用这个：

 bytesRead = responseStream.ReadAsync(buffer, 0, Length).Result;

在 C# 中将大文件读入字节数组的最佳方法？

提问by Tony_Henrich

采纳答案by Mehrdad Afshari

回答by Powerlord

回答by Powerlord

回答by Todd Moses

回答by Marc Gravell

回答by Joel

回答by Dave

回答by elaverick

回答by vapcguy

回答by Disha Sharma

相关推荐

最近更新

标签

在 C# 中将大文件读入字节数组的最佳方法？

提问by Tony_Henrich

采纳答案by Mehrdad Afshari

回答by Powerlord

回答by Powerlord

回答by Todd Moses

回答by Marc Gravell

回答by Joel

回答by Dave

回答by elaverick

回答by vapcguy

回答by Disha Sharma

相关推荐

Linux 简单的 awk 命令问题（FS、OFS 相关）

Linux 查找最大值和最小值并从文件中打印该行

C# 将两位数的年份转换为四位数的年份

Linux grep 排除多个字符串

相关推荐

最近更新

标签