C# Python:Inflate 和 Deflate 实现

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1089662/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 07:59:13  来源:igfitidea点击:

Python: Inflate and Deflate implementations

c#pythoncompressionzlib

提问by Demi

I am interfacing with a server that requires that data sent to it is compressed with Deflatealgorithm (Huffman encoding + LZ77) and also sends data that I need to Inflate.

我正在与一个服务器接口,该服务器要求发送给它的数据使用Deflate算法(霍夫曼编码 + LZ77)进行压缩,并且还发送我需要Inflate 的数据

I know that Python includes Zlib, and that the C libraries in Zlib support calls to Inflateand Deflate, but these apparently are not provided by the Python Zlib module. It does provide Compressand Decompress, but when I make a call such as the following:

我知道 Python 包括 Zlib,并且 Zlib 中的 C 库支持调用InflateDeflate,但这些显然不是由 Python Zlib 模块提供的。它确实提供了CompressDecompress,但是当我拨打如下电话时:

result_data = zlib.decompress( base64_decoded_compressed_string )

I receive the following error:

我收到以下错误:

Error -3 while decompressing data: incorrect header check

Gzip does no better; when making a call such as:

Gzip 没有更好的;拨打电话时,例如:

result_data = gzip.GzipFile( fileobj = StringIO.StringIO( base64_decoded_compressed_string ) ).read()

I receive the error:

我收到错误:

IOError: Not a gzipped file

which makes sense as the data is a Deflatedfile not a true Gzippedfile.

这是有道理的数据是平减文件不是真正的Gzip压缩文件。

Now I know that there is a Deflateimplementation available (Pyflate), but I do not know of an Inflateimplementation.

现在我知道有一个可用的Deflate实现(Pyflate),但我不知道一个Inflate实现。

It seems that there are a few options:

似乎有几个选择:

  1. Find an existing implementation (ideal) of Inflateand Deflatein Python
  2. Write my own Python extension to the zlib c library that includes Inflateand Deflate
  3. Call something else that can be executed from the command line (such as a Ruby script, since Inflate/Deflatecalls in zlib are fully wrapped in Ruby)
  4. ?
  1. 在 Python 中查找InflateDeflate的现有实现(理想)
  2. 为包含InflateDeflate的 zlib c 库编写我自己的 Python 扩展
  3. 调用可以从命令行执行的其他内容(例如 Ruby 脚本,因为zlib 中的Inflate/ Deflate调用完全包含在 Ruby 中)
  4. ?

I am seeking a solution, but lacking a solution I will be thankful for insights, constructive opinions, and ideas.

我正在寻求解决方案,但缺乏解决方案,我将感谢您的见解、建设性意见和想法。

Additional information: The result of deflating (and encoding) a string should, for the purposes I need, give the same result as the following snippet of C# code, where the input parameter is an array of UTF bytes corresponding to the data to compress:

附加信息:为了我需要的目的,对字符串进行压缩(和编码)的结果应该给出与以下 C# 代码片段相同的结果,其中输入参数是与要压缩的数据对应的 UTF 字节数组:

public static string DeflateAndEncodeBase64(byte[] data)
{
    if (null == data || data.Length < 1) return null;
    string compressedBase64 = "";

    //write into a new memory stream wrapped by a deflate stream
    using (MemoryStream ms = new MemoryStream())
    {
        using (DeflateStream deflateStream = new DeflateStream(ms, CompressionMode.Compress, true))
        {
            //write byte buffer into memorystream
            deflateStream.Write(data, 0, data.Length);
            deflateStream.Close();

            //rewind memory stream and write to base 64 string
            byte[] compressedBytes = new byte[ms.Length];
            ms.Seek(0, SeekOrigin.Begin);
            ms.Read(compressedBytes, 0, (int)ms.Length);
            compressedBase64 = Convert.ToBase64String(compressedBytes);
        }
    }
    return compressedBase64;
}

Running this .NET code for the string "deflate and encode me" gives the result

为字符串“deflate and encode me”运行这个 .NET 代码给出了结果

7b0HYBxJliUmL23Ke39K9UrX4HShCIBgEyTYkEAQ7MGIzeaS7B1pRyMpqyqBymVWZV1mFkDM7Z28995777333nvvvfe6O51OJ/ff/z9cZmQBbPbOStrJniGAqsgfP358Hz8iZvl5mbV5mi1nab6cVrM8XeT/Dw==

When "deflate and encode me" is run through the Python Zlib.compress() and then base64 encoded, the result is "eJxLSU3LSSxJVUjMS1FIzUvOT0lVyE0FAFXHB6k=".

当“deflate and encode me”通过 Python Zlib.compress() 运行然后 base64 编码时,结果是“eJxLSU3LSSxJVUjMS1FIzUvOT0lVyE0FAFXHB6k=”。

It is clear that zlib.compress() is not an implementation of the same algorithm as the standard Deflate algorithm.

很明显, zlib.compress() 不是与标准 Deflate 算法相同算法的实现。

More Information:

更多信息

The first 2 bytes of the .NET deflate data ("7b0HY..."), after b64 decoding are 0xEDBD, which does not correspond to Gzip data (0x1f8b), BZip2 (0x425A) data, or Zlib (0x789C) data.

.NET deflate 数据(“7b0HY...”)的前 2 个字节,经过 b64 解码后是 0xEDBD,不对应 Gzip 数据(0x1f8b)、BZip2(0x425A)数据或 Zlib(0x789C)数据。

The first 2 bytes of the Python compressed data ("eJxLS..."), after b64 decoding are 0x789C. This is a Zlib header.

Python 压缩数据(“eJxLS...”)的前 2 个字节,经过 b64 解码后是 0x789C。这是一个 Zlib 头文件。

SOLVED

解决了

To handle the raw deflate and inflate, without header and checksum, the following things needed to happen:

要处理原始 deflate 和 inflate,没有 header 和 checksum,需要发生以下事情:

On deflate/compress: strip the first two bytes (header) and the last four bytes (checksum).

压缩/压缩:去除前两个字节(标头)和后四个字节(校验和)。

On inflate/decompress: there is a second argument for window size. If this value is negative it suppresses headers. here are my methods currently, including the base64 encoding/decoding - and working properly:

关于膨胀/解压:窗口大小有第二个参数。如果此值为负,则它会抑制标题。这是我目前的方法,包括 base64 编码/解码 - 并且正常工作:

import zlib
import base64

def decode_base64_and_inflate( b64string ):
    decoded_data = base64.b64decode( b64string )
    return zlib.decompress( decoded_data , -15)

def deflate_and_base64_encode( string_val ):
    zlibbed_str = zlib.compress( string_val )
    compressed_string = zlibbed_str[2:-4]
    return base64.b64encode( compressed_string )

采纳答案by John Machin

This is an add-on to MizardX's answer, giving some explanation and background.

这是 MizardX 答案的附加内容,提供了一些解释和背景。

See http://www.chiramattel.com/george/blog/2007/09/09/deflatestream-block-length-does-not-match.html

http://www.chiramattel.com/george/blog/2007/09/09/deflatestream-block-length-does-not-match.html

According to RFC 1950, a zlib stream constructed in the default manner is composed of:

根据RFC 1950,以默认方式构建的 zlib 流由以下部分组成:

  • a 2-byte header (e.g. 0x78 0x9C)
  • a deflate stream -- see RFC 1951
  • an Adler-32 checksum of the uncompressed data (4 bytes)
  • 一个 2 字节的标头(例如 0x78 0x9C)
  • deflate 流——见RFC 1951
  • 未压缩数据的 Adler-32 校验和(4 字节)

The C# DeflateStreamworks on (you guessed it) a deflate stream. MizardX's code is telling the zlib module that the data is a raw deflate stream.

C#DeflateStream处理(你猜对了)放气流。MizardX 的代码告诉 zlib 模块数据是原始的 deflate 流。

Observations: (1) One hopes the C# "deflation" method producing a longer string happens only with short input (2) Using the raw deflate stream without the Adler-32 checksum? Bit risky, unless replaced with something better.

观察:(1)希望产生更长字符串的 C#“放气”方法只在短输入时发生 (2) 使用没有 Adler-32 校验和的原始放气流?有点风险,除非换成更好的东西。

Updates

更新

error message Block length does not match with its complement

错误信息 Block length does not match with its complement

If you are trying to inflate some compressed data with the C# DeflateStreamand you get that message, then it is quite possible that you are giving it a a zlib stream, not a deflate stream.

如果您尝试使用 C# 扩充一些压​​缩数据DeflateStream并收到该消息,那么您很可能给它一个 zlib 流,而不是一个 deflate 流。

See How do you use a DeflateStream on part of a file?

请参阅如何在文件的一部分上使用 DeflateStream?

Also copy/paste the error message into a Google search and you will get numerous hits (including the one up the front of this answer) saying much the same thing.

还将错误消息复制/粘贴到 Google 搜索中,您将获得许多相同的点击(包括本答案前面的那个)。

The Java Deflater... used by "the website" ... C# DeflateStream "is pretty straightforward and has been tested against the Java implementation". Which of the following possible Java Deflater constructors is the website using?

Deflater“网站”使用的 Java... ... C# DeflateStream “非常简单,并且已经针对 Java 实现进行了测试”。网站使用了以下哪些可能的 Java Deflater 构造函数?

public Deflater(int level, boolean nowrap)

Creates a new compressor using the specified compression level. If 'nowrap' is true then the ZLIB header and checksum fields will not be used in order to support the compression format used in both GZIP and PKZIP.

public Deflater(int level)

Creates a new compressor using the specified compression level. Compressed data will be generated in ZLIB format.

public Deflater()

Creates a new compressor with the default compression level. Compressed data will be generated in ZLIB format.

public Deflater(int level, boolean nowrap)

使用指定的压缩级别创建新压缩器。如果 'nowrap' 为真,则不会使用 ZLIB 标头和校验和字段以支持 GZIP 和 PKZIP 中使用的压缩格式。

public Deflater(int level)

使用指定的压缩级别创建新压缩器。压缩数据将以 ZLIB 格式生成。

public Deflater()

创建具有默认压缩级别的新压缩器。压缩数据将以 ZLIB 格式生成。

A one-line deflaterafter throwing away the 2-byte zlib header and the 4-byte checksum:

丢弃 2 字节 zlib 标头和 4 字节校验和后的单行压缩器

uncompressed_string.encode('zlib')[2:-4] # does not work in Python 3.x

or

或者

zlib.compress(uncompressed_string)[2:-4]

回答by Markus Jarderot

You can still use the zlibmodule to inflate/deflate data. The gzipmodule uses it internally, but adds a file-header to make it into a gzip-file. Looking at the gzip.pyfile, something like this could work:

您仍然可以使用该zlib模块来膨胀/收缩数据。该gzip模块在内部使用它,但添加了一个文件头以使其成为一个 gzip 文件。查看gzip.py文件,这样的事情可以工作:

import zlib

def deflate(data, compresslevel=9):
    compress = zlib.compressobj(
            compresslevel,        # level: 0-9
            zlib.DEFLATED,        # method: must be DEFLATED
            -zlib.MAX_WBITS,      # window size in bits:
                                  #   -15..-8: negate, suppress header
                                  #   8..15: normal
                                  #   16..30: subtract 16, gzip header
            zlib.DEF_MEM_LEVEL,   # mem level: 1..8/9
            0                     # strategy:
                                  #   0 = Z_DEFAULT_STRATEGY
                                  #   1 = Z_FILTERED
                                  #   2 = Z_HUFFMAN_ONLY
                                  #   3 = Z_RLE
                                  #   4 = Z_FIXED
    )
    deflated = compress.compress(data)
    deflated += compress.flush()
    return deflated

def inflate(data):
    decompress = zlib.decompressobj(
            -zlib.MAX_WBITS  # see above
    )
    inflated = decompress.decompress(data)
    inflated += decompress.flush()
    return inflated

I don't know if this corresponds exactly to whatever your server requires, but those two functions are able to round-trip any data I tried.

我不知道这是否与您的服务器需要的完全一致,但是这两个函数能够来回传输我尝试过的任何数据。

The parameters maps directly to what is passed to the zlib library functions.

参数直接映射到传递给 zlib 库函数的内容。

PythonC
zlib.compressobj(...)deflateInit(...)
compressobj.compress(...)deflate(...)
zlib.decompressobj(...)inflateInit(...)
decompressobj.decompress(...)inflate(...)

C
zlib.compressobj(...)deflateInit(...)
compressobj.compress(...)deflate(...)
zlib.decompressobj(...)inflateInit(...)
decompressobj.decompress(...)inflate(...)

The constructors create the structure and populate it with default values, and pass it along to the init-functions. The compress/decompressmethods update the structure and pass it to inflate/deflate.

构造函数创建结构并使用默认值填充它,并将其传递给 init 函数。的compress/decompress方法更新结构,并将其传递到inflate/ deflate