C# 如果字符串以 <?xml... ?> 部分开头,则将 xml 字符串解析为 xml 文档会失败

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2111586/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 23:35:31  来源:igfitidea点击:

Parsing xml string to an xml document fails if the string begins with <?xml... ?> section

c#.netxml

提问by agnieszka

I have an XML file begining like this:

我有一个像这样开头的 XML 文件:

<?xml version="1.0" encoding="utf-8"?>
<Report xmlns:rd="http://schemas.microsoft.com/SQLServer/reporting/reportdesigner" xmlns="http://schemas.microsoft.com/sqlserver/reporting/2008/01/reportdefinition">
  <DataSources>

When I run following code:

当我运行以下代码时:

byte[] fileContent = //gets bytes
            string stringContent = Encoding.UTF8.GetString(fileContent);
            XDocument xml = XDocument.Parse(stringContent);

I get following XmlException:

我得到以下 XmlException:

Data at the root level is invalid. Line 1, position 1.

根级别的数据无效。第 1 行,位置 1。

Cutting out the version and encoding node fixes the problem. Why? How to process this xml correctly?

删除版本和编码节点可以解决问题。为什么?如何正确处理这个xml?

采纳答案by stevehipwell

If you only have bytes you could either load the bytes into a stream:

如果您只有字节,则可以将字节加载到流中:

XmlDocument oXML;

using (MemoryStream oStream = new MemoryStream(oBytes))
{
  oXML = new XmlDocument();
  oXML.Load(oStream);
}

Or you could convert the bytes into a string (presuming that you know the encoding) before loading the XML:

或者您可以在加载 XML 之前将字节转换为字符串(假设您知道编码):

string sXml;
XmlDocument oXml;

sXml = Encoding.UTF8.GetString(oBytes);
oXml = new XmlDocument();
oXml.LoadXml(sXml);

I've shown my example as .NET 2.0 compatible, if you're using .NET 3.5 you can use XDocumentinstead of XmlDocument.

我已经证明我的例子作为.NET 2.0兼容的,如果你使用.NET 3.5,您可以使用XDocument来代替XmlDocument

Load the bytes into a stream:

将字节加载到流中:

XDocument oXML;

using (MemoryStream oStream = new MemoryStream(oBytes))
using (XmlTextReader oReader = new XmlTextReader(oStream))
{
  oXML = XDocument.Load(oReader);
}

Convert the bytes into a string:

将字节转换为字符串:

string sXml;
XDocument oXml;

sXml = Encoding.UTF8.GetString(oBytes);
oXml = XDocument.Parse(sXml);

回答by Brian Agnew

Do you have a byte-order-mark(BOM) at the beginning of your XML, and does it match your encoding ? If you chop out your header, you'll also chop out the BOM and if that is incorrect, then subsequent parsing may work.

您的 XML 开头是否有字节顺序标记(BOM),它是否与您的编码匹配?如果你砍掉你的标题,你也会砍掉 BOM,如果这是不正确的,那么后续的解析可能会起作用。

You may need to inspect your document at the byte level to see the BOM.

您可能需要在字节级别检查您的文档以查看 BOM。

回答by Darin Dimitrov

Why bothering to read the file as a byte sequence and then converting it to string while it is an xml file? Just leave the framework do the loading for you and cope with the encodings:

为什么要费心将文件作为字节序列读取,然后在它是 xml 文件时将其转换为字符串?只需让框架为您加载并处理编码:

var xml = XDocument.Load("test.xml");

回答by Dave Cluderay

My first thought was that the encoding is Unicode when parsing XML from a .NET string type. It seems, though that XDocument's parsing is quite forgiving with respect to this.

我的第一个想法是从 .NET 字符串类型解析 XML 时编码是 Unicode。看起来,尽管 XDocument 的解析对此相当宽容。

The problem is actually related to the UTF8 preamble/byte order mark (BOM), which is a three-byte signature optionally presentat the start of a UTF-8 stream. These three bytes are a hint as to the encoding being used in the stream.

该问题实际上与 UTF8 前导码/字节顺序标记 (BOM) 相关,它是一个三字节的签名,可选择出现在 UTF-8 流的开头。这三个字节是有关流中使用的编码的提示。

You can determine the preamble of an encoding by calling the GetPreamblemethod on an instance of the System.Text.Encodingclass. For example:

您可以通过GetPreambleSystem.Text.Encoding类的实例上调用方法来确定编码的前导码。例如:

// returns { 0xEF, 0xBB, 0xBF }
byte[] preamble = Encoding.UTF8.GetPreamble();

The preamble should be handled correctly by XmlTextReader, so simply load your XDocumentfrom an XmlTextReader:

序言应该由 正确处理XmlTextReader,因此只需XDocument从加载您的XmlTextReader

XDocument xml;
using (var xmlStream = new MemoryStream(fileContent))
using (var xmlReader = new XmlTextReader(xmlStream))
{
    xml = XDocument.Load(xmlReader);
}

回答by eugene.sushilnikov

Try this:

尝试这个:

int startIndex = xmlString.IndexOf('<');
if (startIndex > 0)
{
    xmlString = xmlString.Remove(0, startIndex);
}