C# 非常简单的短字符串压缩
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1192732/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Really simple short string compression
提问by cbp
Is there a really simple compression technique for strings up to about 255 characters in length (yes, I'm compressing URLs)?
对于长度最多为 255 个字符的字符串,是否有一种非常简单的压缩技术(是的,我正在压缩URL)?
I am not concerned with the strength of compression - I am looking for something that performs very well and is quick to implement. I would like something simpler than SharpZipLib: something that can be implemented with a couple of short methods.
我不关心压缩的强度 - 我正在寻找性能非常好并且可以快速实施的东西。我想要比SharpZipLib更简单的东西:可以用几个简短的方法实现的东西。
采纳答案by badbod99
I think the key question here is "Why do you want to compress URLs?"
我认为这里的关键问题是“为什么要压缩 URL?”
Trying to shorten long urls for the address bar?
试图缩短地址栏的长网址?
You're better storing the original URL somewhere (database, text file ...) alongside a hashcode of the non-domain part (MD5 is fine). You can then have a simple page (or some HTTPModule if you're feeling flashy) to read the MD5 and lookup the real URL. This is how TinyURL and others work.
您最好将原始 URL 与非域部分的哈希码(MD5 很好)一起存储在某个地方(数据库、文本文件...)。然后你可以有一个简单的页面(或者一些 HTTPModule,如果你觉得很花哨)来读取 MD5 并查找真正的 URL。这就是 TinyURL 和其他人的工作方式。
For example:
例如:
http://mydomain.com/folder1/folder2/page1.aspx
Could be shorted to:
可以简写为:
http://mydomain.com/2d4f1c8a
Using a compression library for this will not work. The string will be compressed into a shorter binary representation, but converting this back to a string which needs to be valid as part of a URL (e.g. Base64) will negate any benefit you gained from the compression.
为此使用压缩库是行不通的。该字符串将被压缩为更短的二进制表示,但将其转换回需要作为 URL 的一部分有效的字符串(例如 Base64)将否定您从压缩中获得的任何好处。
Storing lots of URLs in memory or on disk?
将大量 URL 存储在内存或磁盘中?
Use the built in compressing library within System.IO.Compression or the ZLib library which is simple and incredibly good. Since you will be storing binary data the compressed output will be fine as-is. You'll need to uncompress it to use it as a URL.
使用 System.IO.Compression 或 ZLib 库中的内置压缩库,它简单且非常好。由于您将存储二进制数据,因此压缩输出将保持原样。您需要解压缩它才能将其用作 URL。
回答by peSHIr
What's your goal?
你的目标是什么?
- A shorter URL? Try URL shorteners like http://tinyurl.com/or http://is.gd/
- Storage space? Check out System.IO.Compression. (Or SharpZipLib)
- 更短的网址?尝试 URL 缩短器,如http://tinyurl.com/或http://is.gd/
- 储存空间?查看 System.IO.Compression。(或SharpZipLib)
回答by Grzenio
I would start with trying one of the existing (free or open source) zip libraries, e.g. http://www.icsharpcode.net/OpenSource/SharpZipLib/
我将首先尝试现有的(免费或开源)zip 库之一,例如http://www.icsharpcode.net/OpenSource/SharpZipLib/
Zip should work well for text strings, and I am not sure if it is worth implementing a compression algorithm yourserlf....
Zip 应该适用于文本字符串,我不确定是否值得自己实现压缩算法......
回答by Dan Diplo
I'd suggest looking in the System.IO.Compression Namespace. There's an article on CodeProjectthat may help.
我建议查看System.IO.Compression Namespace。有一篇关于 CodeProject 的文章可能会有所帮助。
回答by Justin
回答by Wolfwyrd
The open source library SharpZipLibis easy to use and will provide you with compression tools
开源库SharpZipLib简单易用,将为您提供压缩工具
回答by Cheeso
As suggested in the accepted answer, Using data compression does not work to shorten URL paths that are already fairly short.
正如已接受的答案中所建议的,使用数据压缩不能缩短已经相当短的 URL 路径。
DotNetZiphas a DeflateStream class that exposes a static (Shared in VB) CompressStringmethod. It's a one-line way to compress a string using DEFLATE (RFC 1951). The DEFLATE implementation is fully compatible with System.IO.Compression.DeflateStream, but DotNetZip compresses better. Here's how you might use it:
DotNetZip有一个 DeflateStream 类,它公开一个静态(在 VB 中共享)CompressString方法。这是使用 DEFLATE ( RFC 1951)压缩字符串的一种单行方式。DEFLATE 实现与System.IO.Compression.DeflateStream完全兼容,但 DotNetZip 压缩得更好。以下是您可以如何使用它:
string[] orig = {
"folder1/folder2/page1.aspx",
"folderBB/folderAA/page2.aspx",
};
public void Run()
{
foreach (string s in orig)
{
System.Console.WriteLine("original : {0}", s);
byte[] compressed = DeflateStream.CompressString(s);
System.Console.WriteLine("compressed : {0}", ByteArrayToHexString(compressed));
string uncompressed = DeflateStream.UncompressString(compressed);
System.Console.WriteLine("uncompressed: {0}\n", uncompressed);
}
}
Using that code, here are my test results:
使用该代码,这是我的测试结果:
original : folder1/folder2/page1.aspx
compressed : 4bcbcf49492d32d44f03d346fa0589e9a9867a89c5051500
uncompressed: folder1/folder2/page1.aspx
original : folderBB/folderAA/page2.aspx
compressed : 4bcbcf49492d7272d24f03331c1df50b12d3538df4128b0b2a00
uncompressed: folderBB/folderAA/page2.aspx
So you can see the "compressed" byte array, when represented in hex, is longer than the original, about 2x as long. The reason is that a hex byte is actually 2 ASCII chars.
所以你可以看到“压缩”的字节数组,当以十六进制表示时,比原来的要长,大约是原来的 2 倍。原因是一个十六进制字节实际上是 2 个 ASCII 字符。
You could compensate somewhat for that by using base-62, instead of base-16 (hex) to represent the number. In that case a-z and A-Z are also digits, giving you 0-9 (10) + a-z (+26) + A-Z (+26) = 62 total digits. That would shorten the output significantly. I haven't tried that. yet.
您可以通过使用 base-62 而不是 base-16(十六进制)来表示数字来对此进行一些补偿。在那种情况下 az 和 AZ 也是数字,给你 0-9 (10) + az (+26) + AZ (+26) = 62 个数字。这将显着缩短输出。我没试过。然而。
EDIT
Ok I tested the Base-62 encoder. It shortens the hex string by about half. I figured it would cut it to 25% (62/16 =~ 4) But I think I am losing something with the discretization. In my tests, the resulting base-62 encoded string is about the same length as the original URL. So, no, using compression and then base-62 encoding is still not a good approach. you really want a hash value.
编辑
好的,我测试了 Base-62 编码器。它将十六进制字符串缩短了大约一半。我认为它将减少到 25% (62/16 =~ 4) 但我认为我在离散化方面失去了一些东西。在我的测试中,生成的 base-62 编码字符串与原始 URL 的长度大致相同。所以,不,使用压缩然后使用 base-62 编码仍然不是一个好方法。你真的想要一个哈希值。
回答by endolith
You can use deflate algorithm directly, without any headers checksums or footers, as described in this question: Python: Inflate and Deflate implementations
您可以直接使用 deflate 算法,无需任何页眉校验和或页脚,如以下问题所述:Python: Inflate and Deflate implementations
This cuts down a 4100 character URL to 1270 base64 characters, in my test, allowing it to fit inside IE's 2000 limit.
在我的测试中,这将 4100 个字符的 URL 减少到 1270 个 base64 字符,使其适合 IE 的 2000 限制。
And here's an example of a 4000-character URL, which can't be solved with a hashtable since the applet can exist on any server.
这是一个4000 个字符的 URL的示例,它无法用哈希表解决,因为小程序可以存在于任何服务器上。
回答by Todd
I have just created a compression scheme that targets URLs and achieves around 50% compression (compared to base64 representation of the original URL text).
我刚刚创建了一个针对 URL 的压缩方案,并实现了大约 50% 的压缩(与原始 URL 文本的 base64 表示相比)。
see http://blog.alivate.com.au/packed-url/
见http://blog.alivate.com.au/packed-url/
It would be great if someone from a big tech company built this out properly and published it for all to use. Google championed Protocol buffers. This tool can save a lot of disk space for someone like Google, while still being scannable. Or perhaps the great captain himself? https://twitter.com/capnproto
如果来自大型科技公司的人正确构建并发布供所有人使用,那就太好了。谷歌支持协议缓冲区。这个工具可以为像谷歌这样的人节省大量磁盘空间,同时仍然可以扫描。或者也许是伟大的船长本人?https://twitter.com/capnproto
Technically, I would call this a binary (bitwise) serialisation scheme for the data that underlies a URL. Treat the URL as text-representation of conceptual data, then serialize that conceptual data model with a specialised serializer. The outcome is a more compressed version of the original of course. This is very different to how a general-purpose compression algorithm works.
从技术上讲,我将其称为 URL 基础数据的二进制(按位)序列化方案。将 URL 视为概念数据的文本表示,然后使用专门的序列化程序序列化该概念数据模型。结果当然是原始版本的更压缩版本。这与通用压缩算法的工作方式大不相同。