C# 决定何时使用 XmlDocument 和 XmlReader

Question

提问by PhilChuang

I'm optimizing a custom object -> XML serialization utility, and it's all done and working and that's not the issue.

我正在优化自定义对象 -> XML 序列化实用程序，这一切都已完成并正常工作，这不是问题。

It worked by loading a file into an XmlDocumentobject, then recursively going through all the child nodes.

它的工作原理是将文件加载到XmlDocument对象中，然后递归遍历所有子节点。

I figured that perhaps using XmlReaderinstead of having XmlDocumentloading/parsing the entire thing would be faster, so I implemented that version as well.

我想也许使用XmlReader而不是XmlDocument加载/解析整个东西会更快，所以我也实现了那个版本。

The algorithms are exactly the same, I use a wrapper class to abstract the functionality of dealing with an XmlNodevs. an XmlReader. For instance, the GetChildrenmethods yield returns either a child XmlNodeor a SubTree XmlReader.

算法完全相同，我使用包装类来抽象处理 anXmlNode与 an 的功能XmlReader。例如，GetChildren方法 yield 返回一个 childXmlNode或一个 SubTree XmlReader。

So I wrote a test driver to test both versions, and using a non-trivial data set (a 900kb XML file with around 1,350 elements).

所以我编写了一个测试驱动程序来测试这两个版本，并使用了一个重要的数据集（一个 900kb 的 XML 文件，包含大约 1,350 个元素）。

However, using JetBrains dotTRACE, I see that the XmlReaderversion is actually slower than the XmlDocumentversion! It seems that there is some significant processing involved in XmlReaderread calls when I'm iterating over child nodes.

然而，使用JetBrains dotTRACE，我看到XmlReader版本实际上比XmlDocument版本慢！XmlReader当我迭代子节点时，读取调用似乎涉及一些重要的处理。

So I say all that to ask this:

所以我说这一切是为了问这个：

What are the advantages/disadvantages of XmlDocumentand XmlReader, and in what circumstances should you use either?

有什么优势/劣势XmlDocument和XmlReader，和你应该使用什么情况下要么？

My guess is that there is a file size threshold at which XmlReaderbecomes more economical in performance, as well as less memory-intensive. However, that threshold seems to be above 1MB.

我的猜测是有一个文件大小阈值，在该阈值处，XmlReader性能会变得更经济，并且内存占用更少。但是，该阈值似乎高于 1MB。

I'm calling ReadSubTreeevery time to process child nodes:

我ReadSubTree每次都打电话来处理子节点：

public override IEnumerable<IXmlSourceProvider> GetChildren ()
{
    XmlReader xr = myXmlSource.ReadSubtree ();
    // skip past the current element
    xr.Read ();

    while (xr.Read ())
    {
        if (xr.NodeType != XmlNodeType.Element) continue;
        yield return new XmlReaderXmlSourceProvider (xr);
    }
}

That test applies to a lot of objects at a single level (i.e. wide & shallow) - but I wonder how well XmlReaderfares when the XML is deep & wide? I.e. the XML I'm dealing with is much like a data object model, 1 parent object to many child objects, etc: 1..M..M..M

该测试适用于单个级别（即宽和浅）的许多对象 - 但我想知道XmlReader当 XML 是深和宽时的表现如何？即我正在处理的 XML 很像数据对象模型、多个子对象的 1 个父对象等：1..M..M..M

I also don't know beforehand the structure of the XML I'm parsing, so I can't optimize for it.

我也不知道我正在解析的 XML 的结构，所以我无法优化它。

Answer 1

采纳答案by Zach Bonham

I've generally looked at it not from a fastest perspective, but rather from a memory utilizationperspective. All of the implementations have been fast enough for the usage scenarios I've used them in (typical enterprise integration).

我一般不是从最快的角度来看它，而是从内存利用率的角度来看。对于我在其中使用它们的使用场景（典型的企业集成），所有实现都足够快。

However, where I've fallen down, and sometimes spectacularly, is not taking into account the general size of the XML I'm working with. If you think about it up front you can save yourself some grief.

然而，我失败的地方，有时甚至是惊人的，并没有考虑到我正在使用的 XML 的一般大小。如果你提前考虑一下，你可以为自己省去一些悲伤。

XML tends to bloat when loaded into memory, at least with a DOM reader like XmlDocumentor XPathDocument. Something like 10:1? The exact amount is hard to quantify, but if it's 1MB on disk it will be 10MB in memory, or more, for example.

XML 在加载到内存中时往往会膨胀，至少对于像XmlDocument或之类的 DOM 阅读器XPathDocument。10:1之类的？确切的数量很难量化，但如果它在磁盘上是 1MB，那么它在内存中将是 10MB，或者更多，例如。

A process using any reader that loads the whole document into memory in its entirety (XmlDocument/XPathDocument) can suffer from large object heap fragmentation, which can ultimately lead to OutOfMemoryExceptions (even with available memory) resulting in an unavailable service/process.

使用任何读取器将整个文档完整加载到内存中 ( XmlDocument/ XPathDocument)的进程可能会遇到大对象堆碎片，这最终会导致OutOfMemoryExceptions（即使有可用内存）导致服务/进程不可用。

Since objects that are greater than 85K in size end up on the large object heap, and you've got a 10:1 size explosion with a DOM reader, you can see it doesn't take much before your XML documents are being allocated from the large object heap.

由于大小大于 85K 的对象最终位于大型对象堆上，并且使用 DOM 读取器时您的大小爆炸了 10:1，因此您可以看到在 XML 文档从大对象堆。

XmlDocumentis very easy to use. Its only real drawback is that it loads the whole XML document into memory to process. Its seductively simple to use.

XmlDocument非常容易使用。它唯一真正的缺点是它将整个 XML 文档加载到内存中进行处理。它使用起来非常简单。

XmlReaderis a stream based reader so will keep your process memory utilization generally flatter but is more difficult to use.

XmlReader是一个基于流的阅读器，因此将使您的进程内存利用率通常更平坦，但更难使用。

XPathDocumenttends to be a faster, read-only version of XmlDocument, but still suffers from memory 'bloat'.

XPathDocument往往是 XmlDocument 的更快的只读版本，但仍然受到内存“膨胀”的困扰。

Answer 2

回答by Joe

There is a size threshold at which XmlDocument becomes slower, and eventually unusable. But the actual value of the threshold will depend on your application and XML content, so there are no hard and fast rules.

有一个大小阈值，在该阈值处 XmlDocument 会变慢并最终无法使用。但是阈值的实际值将取决于您的应用程序和 XML 内容，因此没有硬性规定。

If your XML file can contain large lists (say tens of thousands of elements), you should definitely be using XmlReader.

如果您的 XML 文件可以包含大型列表（比如数万个元素），那么您绝对应该使用 XmlReader。

Answer 3

回答by DSO

XmlDocument is an in-memory representation of the entire XML document. Therefore if your document is large, then it will consume much more memory than if you had read it using XmlReader.

XmlDocument 是整个 XML 文档的内存表示。因此，如果您的文档很大，那么它会比使用 XmlReader 阅读它消耗更多的内存。

This is assuming that when you use XmlReader you read and process the elements one-by-one then discard it. If you use XmlReader and construct another intermediary structure in memory then you have the same problem, and you're defeating the purpose of it.

这是假设当您使用 XmlReader 时，您会一一读取和处理元素，然后将其丢弃。如果您使用 XmlReader 并在内存中构造另一个中间结构，那么您会遇到同样的问题，并且您违背了它的目的。

Google for "SAX versus DOM" to read more about the difference between the two models of processing XML.

谷歌搜索“ SAX 与 DOM”以阅读更多关于两种处理 XML 模型之间的差异的信息。

Answer 4

回答by David V. Corbin

The encoding difference is because two different measurements are being mixed. UTF-32 requires 4 bytes per character, and is inherently slower than single byte data.

编码差异是因为混合了两种不同的测量。UTF-32 每个字符需要 4 个字节，并且本质上比单字节数据慢。

If you look at the large (100K) element test, you see that the time increasesw by about 70mS for each case regardless of the loading method used.

如果您查看大型 (100K) 元素测试，您会发现无论使用何种加载方法，每种情况的时间都会增加约 70 毫秒。

This is a (nearly) constant difference caused specifically by the per character overhead,

这是一个（几乎）恒定的差异，特别是由每个字符的开销引起的，

Answer 5

回答by Display Name

Another consideration is that XMLReader might be more robust for handling less-than-perfectly-formed XML. I recently created a client which consumed an XML stream, but the stream didn't have the special characters escaped correctly in URIs contained in some of the elements. XMLDocument and XPathDocument refused to load the XML at all, whereas using XMLReader I was able to extract the information I needed from the stream.

另一个考虑因素是 XMLReader 可能更健壮地处理格式不完美的 XML。我最近创建了一个使用 XML 流的客户端，但该流没有在某些元素中包含的 URI 中正确转义特殊字符。XMLDocument 和 XPathDocument 根本拒绝加载 XML，而使用 XMLReader 我能够从流中提取我需要的信息。

C# 决定何时使用 XmlDocument 和 XmlReader

提问by PhilChuang

采纳答案by Zach Bonham

回答by Joe

回答by DSO

回答by David V. Corbin

回答by Display Name

相关推荐

最近更新

标签

C# 决定何时使用 XmlDocument 和 XmlReader

提问by PhilChuang

采纳答案by Zach Bonham

回答by Joe

回答by DSO

回答by David V. Corbin

回答by Display Name

相关推荐

C# ASP.NET MVC - 如何在登录页面上显示未经授权的错误？

Linux 如何设置每两周一次的 cron 作业（每周两次）

C# 表单中的 Foreach 控件，如何对表单中的所有文本框执行操作？

Linux 是否可以结合使用 tail 和 grep？

相关推荐

最近更新

标签