C# 从 xml 文档中解码 base64 编码的数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1771242/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 20:40:31  来源:igfitidea点击:

Decoding base64-encoded data from xml document

c#xmlbase64decode

提问by Kjensen

I receive some xml-files with embedded base64-encoded images, that I need to decode and save as files.

我收到一些带有嵌入 base64 编码图像的 xml 文件,我需要对其进行解码并另存为文件。

An unmodified (other than zipped) example of such a file can be downloaded below:

可以在下面下载此类文件的未修改(除压缩外)示例:

20091123-125320.zip(60KB)

20091123-125320.zip(60KB)

However, I get errors like "Invalid length for a Base-64 char array" and "Invalid character in a Base-64 string". I marked the line in the code where I get the error in the code.

但是,我收到诸如“Base-64 字符数组的长度无效”和“Base-64 字符串中的字符无效”之类的错误。我在代码中标记了代码中出现错误的行。

A file could look like this:

文件可能如下所示:

<?xml version="1.0" encoding="windows-1252"?>
<mediafiles>
    <media media-type="image">
      <media-reference mime-type="image/jpeg"/>
      <media-object encoding="base64"><![CDATA[/9j/4AAQ[...snip...]P4Vm9zOR//Z=]]></media-object>
      <media.caption>What up</media.caption>
    </media>
</mediafiles>

And the code to process like this:

和处理这样的代码:

var xd = new XmlDocument();
xd.Load(filename);
var nodes = xd.GetElementsByTagName("media");

foreach (XmlNode node in nodes)
        {
            var mediaObjectNode = node.SelectSingleNode("media-object");
            //The line below is where the errors occur
            byte[] imageBytes = Convert.FromBase64String(mediaObjectNode.InnerText);
            //Do stuff with the bytearray to save the image
        }

The xml-data is from an enterprise newspaper system, so I am pretty sure the files are ok - and there must be something in the way I process them, that is just wrong. Maybe a problem with the encoding?

xml-data 来自企业报纸系统,所以我很确定这些文件没问题——而且我处理它们的方式肯定有问题,那是错误的。也许编码有问题?

I have tried writing out the contents of mediaObjectNode.InnerText, and it is the base64 encoded data - so the navigating the xml-doc is not the issue.

我已经尝试写出 mediaObjectNode.InnerText 的内容,它是 base64 编码的数据 - 所以导航 xml-doc 不是问题。

I have been googling, binging, stackoverflowing and crying - and found no solution... Help!

我一直在谷歌上搜索,binging,stackoverflowing 和哭泣 - 并没有找到解决方案......帮助!

Edit:

编辑:

Added an actual example file (and a bounty). PLease note the downloadable file is in a bit different schema, since I simplified it in the above example, removing irrelevant stuff...

添加了一个实际的示例文件(和一个赏金)。请注意可下载文件的架构略有不同,因为我在上面的示例中对其进行了简化,删除了不相关的内容...

采纳答案by Oliver

For a first shot i didn't use any programming language, just Notepad++

第一次拍摄时,我没有使用任何编程语言,只是使用 Notepad++

I opened the xml file within and copy and pasted the raw base64 content into a new file (without square brackets).

我打开其中的 xml 文件并将原始 base64 内容复制并粘贴到一个新文件中(不带方括号)。

Afterwards I selected everything (Strg-A) and used the option Extensions - Mime Tools - Base64 decode. This threw an error about the wrong text length (must be mod 4). So i just added two equal signs ('=') as placeholder at the end to get the correct length.

之后我选择了所有内容(Strg-A)并使用选项扩展 - Mime 工具 - Base64 解码。这引发了关于错误文本长度的错误(必须是 mod 4)。所以我只是在末尾添加了两个等号 ('=') 作为占位符以获得正确的长度。

Another retry and it decoded successfully into 'something'. Just save the file as .jpg and it opens like a charm in any picture viewer.

再次重试,它成功解码为“某物”。只需将文件另存为 .jpg,它就会在任何图片查看器中像魅力一样打开。

So i would say, there IS something wrong with the data you'll get. They just don't have the right numbers of equal signs at the end to fill up to a number of signs which can be break into packets of 4.

所以我会说,你得到的数据有问题。他们只是在末尾没有正确数量的等号来填充许多可以分成 4 个数据包的符号。

The 'easy' way would be to add the equal sign till the decoding doesn't throw an error. The better way would be to count the number of characters (minus CR/LFs!) and add the needed ones in one step.

“简单”的方法是添加等号,直到解码不会引发错误。更好的方法是计算字符数(减去 CR/LF!)并一步添加所需的字符。

Further investigations

进一步调查

After some coding and reading of the convert function, the problem is a wrong attaching of a equal sign from the producer. Notepad++ has no problem with tons of equal signs, but the Convert function from MS only works with zero, one or two signs. So if you fill up the already existing one with additional equal signs you get an error too! To get this damn thing to work, you have to cut off all existing signs, calculate how much are needed and add them again.

对转换函数进行一些编码和阅读之后,问题是生产者错误地附加了等号。Notepad++ 处理大量等号没有问题,但 MS 的 Convert 函数只能处理零、一或两个符号。所以如果你用额外的等号填充已经存在的,你也会得到一个错误!为了让这该死的东西起作用,你必须切断所有现有的标志,计算需要多少,然后再次添加它们。

Just for the bounty, here is my code (not absolute perfect, but enough for a good starting point): ;-)

只是为了赏金,这是我的代码(不是绝对完美,但足以作为一个好的起点):;-)

    static void Main(string[] args)
    {
        var elements = XElement
            .Load("test.xml")
            .XPathSelectElements("//media/media-object[@encoding='base64']");
        foreach (XElement element in elements)
        {
            var image = AnotherDecode64(element.Value);
        }
    }

    static byte[] AnotherDecode64(string base64Decoded)
    {
        string temp = base64Decoded.TrimEnd('=');
        int asciiChars = temp.Length - temp.Count(c => Char.IsWhiteSpace(c));
        switch (asciiChars % 4)
        {
            case 1:
                //This would always produce an exception!!
                //Regardless what (or what not) you attach to your string!
                //Better would be some kind of throw new Exception()
                return new byte[0];
            case 0:
                asciiChars = 0;
                break;
            case 2:
                asciiChars = 2;
                break;
            case 3:
                asciiChars = 1;
                break;
        }
        temp += new String('=', asciiChars);

        return Convert.FromBase64String(temp);
    }

回答by futureelite7

Is the character encoding correct? The error sounds like there's a problem that causes invalid characters to appear in the array. Try copying out the text and decoding manually to see if the data is indeed valid.

字符编码是否正确?该错误听起来像是存在导致无效字符出现在数组中的问题。尝试复制文本并手动解码以查看数据是否确实有效。

(For the record, windows-1252 is not exactly the same as iso-8859-1, so that may be the cause of a problem, barring other sources of corruption.)

(作为记录,windows-1252 与 iso-8859-1 并不完全相同,因此除非有其他损坏来源,否则这可能是问题的原因。)

回答by Anton Gogolev

Well, it's all very simple. CDATAis a node itself, so mediaObjectNode.InnerTextactually produces <![CDATA[/9j/4AAQ[...snip...]P4Vm9zOR//Z=]]>, which is obviously not valid Base64-encoded data.

嗯,这一切都非常简单。CDATA是一个节点本身,所以mediaObjectNode.InnerText实际上产生<![CDATA[/9j/4AAQ[...snip...]P4Vm9zOR//Z=]]>,这显然不是有效的 Base64 编码数据。

To make things work, use mediaObjectNode.ChildNodes[0].Valueand pass that value to Convert.FromBase64String'.

为了使事情工作,使用mediaObjectNode.ChildNodes[0].Value该值并将其传递给Convert.FromBase64String'.

回答by Darin Dimitrov

Try using Linq to XML:

尝试使用 Linq to XML:

using System.Xml.XPath;

class Program
{
    static void Main(string[] args)
    {
        var elements = XElement
            .Load("test.xml")
            .XPathSelectElements("//media/media-object[@encoding='base64']");
        foreach (var element in elements)
        {
            byte[] image = Convert.FromBase64String(element.Value);
        }
    }
}


UPDATE:

更新:

After downloading the XML file and analyzing the value of the media-objectnode it is clear that it is not a valid base64 string:

下载 XML 文件并分析media-object节点的值后,很明显它不是有效的 base64 字符串:

string value = "PUT HERE THE BASE64 STRING FROM THE XML WITHOUT THE NEW LINES";
byte[] image = Convert.FromBase64String(value);

throws a System.FormatExceptionsaying that the length is not a valid base 64 string. Event when I remove the \nfrom the string it doesn't work:

抛出一个System.FormatException说法,即长度不是有效的 base 64 字符串。当我\n从字符串中删除它不起作用时的事件:

var elements = XElement
    .Load("20091123-125320.xml")
    .XPathSelectElements("//media/media-object[@encoding='base64']");
foreach (var element in elements)
{
    string value = element.Value.Replace("\n", "");
    byte[] image = Convert.FromBase64String(value);
}

also throws System.FormatException.

也抛出System.FormatException

回答by Andrew

The base64 string is not valid as Oliver has already said, the string length must be multiples of 4 after removing white space characters. If you look at then end of the base64 string (see below) you will see the line is shorter than the rest.

正如 Oliver 已经说过的,base64 字符串无效,字符串长度必须是去除空白字符后的 4 的倍数。如果您查看 base64 字符串的结尾(见下文),您将看到该行比其他行短。

RRRRRRRRRRRRRRRRRRRRRRRRRRRRX//Z=

If you remove this line, your program will work, but the resulting image will have a missing section in the bottom right hand corner. You need to pad this line so the overall string length is corect. From my calculations if you had 3 characters it should work.

如果删除此行,您的程序将运行,但生成的图像将在右下角缺少部分。您需要填充此行,以便整个字符串长度正确。根据我的计算,如果你有 3 个字符,它应该可以工作。

RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRX//Z=

回答by swapnil malap

remove last 2 characters while image not get proper

删除最后 2 个字符,而图像不正确

public Image Base64ToImage(string base64String)
    {
        // Convert Base64 String to byte[]
        byte[] imageBytes=null;
        bool iscatch=true;
        while(iscatch)
        {
            try 
                {           
         imageBytes = Convert.FromBase64String(base64String);
         iscatch = false;

            }
            catch 
            {
                int length=base64String.Length;
                base64String=base64String.Substring(0,length-2);
            }
        }
        MemoryStream ms = new MemoryStream(imageBytes, 0,
          imageBytes.Length);

        // Convert byte[] to Image
        ms.Write(imageBytes, 0, imageBytes.Length);
        Image image = Image.FromStream(ms, true);
        pictureBox1.Image = image;
        return image;
    }

回答by Stipo

I've also had a problem with decoding Base64 encoded string from XML document (specifically Office OpenXML package document).

我也遇到了从 XML 文档(特别是 Office OpenXML 包文档)解码 Base64 编码字符串的问题。

It turned out that string had additional encoding applied: HTML encoding, so doing first HTML decoding and then Base64 decoding did the trick:

事实证明,字符串应用了额外的编码:HTML 编码,因此先进行 HTML 解码,然后进行 Base64 解码即可:

private static byte[] DecodeHtmlBase64String(string value)
{
    return System.Convert.FromBase64String(System.Net.WebUtility.HtmlDecode(value));
}

Just in case someone else stumbles on the same issue.

以防万一其他人偶然发现同样的问题。