C# iTextSharp 国际文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1727765/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 20:18:12  来源:igfitidea点击:

iTextSharp international text

c#asp.netitextsharpexport-to-pdf

提问by Arny

I have a table in asp.net page,and trying to export it as a PDF file,I have couple of international characters that are not shown in generated PDF file,any suggestions,

我在asp.net页面中有一个表格,并试图将其导出为PDF文件,我有几个国际字符未在生成的PDF文件中显示,任何建议,

Thanks in advance

提前致谢

回答by Bobby

You can try setting the encoding for the font you are using. In Java would be something like this:

您可以尝试为您使用的字体设置编码。在 Java 中将是这样的:

BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.EMBEDDED);

where the BaseFont.CP1252 is the encoding. Try to search for the exact encoding you need for the characters to be displayed.

其中 BaseFont.CP1252 是编码。尝试搜索显示字符所需的确切编码。

回答by Stewbob

The key for proper display of alternate characters sets (Russian, Chinese, Japanese, etc.) is to use IDENTITY_H encoding when creating the BaseFont.

正确显示替代字符集(俄语、中文、日语等)的关键是在创建 BaseFont 时使用 IDENTITY_H 编码。

Dim bfR As iTextSharp.text.pdf.BaseFont
  bfR = iTextSharp.text.pdf.BaseFont.CreateFont("MyFavoriteFont.ttf", iTextSharp.text.pdf.BaseFont.IDENTITY_H, iTextSharp.text.pdf.BaseFont.EMBEDDED)

IDENTITY_H provides unicode support for your chosen font, so you should be able to display pretty much any character. I've used it for Russian, Greek, and all the different European language letters.

IDENTITY_H 为您选择的字体提供 unicode 支持,因此您应该能够显示几乎任何字符。我已经将它用于俄语、希腊语和所有不同的欧洲语言字母。

EDIT - 2013-May-28

编辑 - 2013 年 5 月 28 日

This also works for v5.0.2 of iTextSharp.

这也适用于 iTextSharp v5.0.2。

EDIT - 2015-June-23

编辑 - 2015 年 6 月 23 日

Given below is a complete code sample (in C#):

下面给出了一个完整的代码示例(在 C# 中):

private void CreatePdf()
{
  string testText = "???ě???";
  string tmpFile = @"C:\test.pdf";
  string myFont = @"C:\<<valid path to the font you want>>\verdana.ttf";
  iTextSharp.text.Rectangle pgeSize = new iTextSharp.text.Rectangle(595, 792);
  iTextSharp.text.Document doc = new iTextSharp.text.Document(pgeSize, 10, 10, 10, 10);
  iTextSharp.text.pdf.PdfWriter wrtr;
  wrtr = iTextSharp.text.pdf.PdfWriter.GetInstance(doc,
      new System.IO.FileStream(tmpFile, System.IO.FileMode.Create));
  doc.Open();
  doc.NewPage();
  iTextSharp.text.pdf.BaseFont bfR;
  bfR = iTextSharp.text.pdf.BaseFont.CreateFont(myFont,
    iTextSharp.text.pdf.BaseFont.IDENTITY_H,
    iTextSharp.text.pdf.BaseFont.EMBEDDED);

  iTextSharp.text.BaseColor clrBlack = 
      new iTextSharp.text.BaseColor(0, 0, 0);
  iTextSharp.text.Font fntHead =
      new iTextSharp.text.Font(bfR, 12, iTextSharp.text.Font.NORMAL, clrBlack);

  iTextSharp.text.Paragraph pgr = 
      new iTextSharp.text.Paragraph(testText, fntHead);
  doc.Add(pgr);
  doc.Close();
}

This is a screenshot of the pdf file that is created:

这是创建的pdf文件的屏幕截图:

sample pdf

样本pdf

An important point to remember is that if the font you have chosen does not support the characters you are trying to send to the pdf file, nothing you do in iTextSharp is going to change that. Verdana nicely displays the characters from all the European fonts I know of. Other fonts may not be able to display as many characters.

要记住的重要一点是,如果您选择的字体不支持您尝试发送到 pdf 文件的字符,那么您在 iTextSharp 中所做的任何事情都不会改变它。Verdana 很好地显示了我所知道的所有欧洲字体中的字符。其他字体可能无法显示那么多字符。

回答by Mark Storer

There are two potential reasons characters aren't rendered:

字符未呈现的潜在原因有两个:

  1. The encoding. As Stewbob pointed out, Identity-H is a great way to avoid the issue entirely, though it does require you to embed a subset of the font. This has two consequences.
    1. It increases the file size a bit over unembedded fonts.
    2. The font has to be licensed for embedded subsets. Most are, some are not.
  2. The font has to contain that character. If you ask for some Arabic ligatures out of a Cyrillic (Russian) font, chances aren't good that it'll be there. There are very few fonts that cover a variety of languages, and they tend to be HUGE. The biggest/most comprehensive font I've run into was "Arial Unicode MS". Over 23 megabytes.
  1. 编码。正如 Stewbob 指出的那样,Identity-H 是完全避免该问题的好方法,尽管它确实需要您嵌入字体的一个子集。这有两个后果。
    1. 它比未嵌入的字体增加了一点文件大小。
    2. 该字体必须获得嵌入子集的许可。大多数是,有些不是。
  2. 字体必须包含该字符。如果您要求使用 Cyrillic(俄语)字体进行一些阿拉伯连字,那么它出现的可能性不大。涵盖多种语言的字体很少,而且它们往往很大。我遇到的最大/最全面的字体是“Arial Unicode MS”。超过 23 兆字节。

That's another good reason to require embedding SUBSETS. Tacking on a few megabytes because you wanted to add a couple Chinese glyphs is a bit steep.

这是需要嵌入 SUBSETS 的另一个很好的理由。因为你想添加几个中国字形而增加几兆字节有点陡峭。

If you're feeling paranoid, you can check your strings against a given BaseFont instance (which I believe takes the encoding into account as well) with myBaseFont.charExists(someChar). If you have a font you're confident in, I wouldn't bother.

如果您感到偏执,您可以使用myBaseFont.charExists(someChar). 如果你有一个你有信心的字体,我不会打扰。

PS: There's another good reason that Identity-H requires an embedded subset. Identity-H reads the bytes from the content stream as Glyph Indexes. The order of glyphs can vary wildly from one font to the next, or even between versions of the same font. Relying on a viewers system to have the EXACT same font is a bad idea, so its illegal... particularly when Acrobat/Reader starts substituting fonts because it couldn't find the exact font you asked for and you didn't embed it.

PS:Identity-H 需要嵌入子集还有另一个很好的理由。Identity-H 从内容流中读取字节作为字形索引。字形的顺序可以从一种字体到另一种字体,甚至在同一字体的版本之间变化很大。依靠查看器系统拥有完全相同的字体是一个坏主意,所以它是非法的……特别是当 Acrobat/Reader 开始替换字体时,因为它找不到您要求的确切字体并且您没有嵌入它。

回答by Matt Stuvysant

It caused by default iTextSharp font - Helvetica - that does not support other than base characters (or not support all other characters.

它是由默认 iTextSharp 字体 - Helvetica 引起的 - 不支持除基本字符以外的其他字符(或不支持所有其他字符。

There are actually 2 options:

实际上有2个选项:

  1. One is to rewrite the table content by hand into the code. This approach might look faster to you, but it requires any modification to the original table to be repeated in the code as well (breaking DRY principle). In this case, you can easily set-up font as you wish.
  2. The other is to extract PDF from HTML extracted from HtmlEngine. This might sound a bit more complicated and complex (and it is), however, working solution is much more flexible and universal. I suffered the struggle with special characters myself just a while ago and decided to post a somewhat complete solution under other similar solution here on stackoverflow: https://stackoverflow.com/a/24587745/1138663
  1. 一种是手工将表格内容重写到代码中。这种方法对您来说可能看起来更快,但它需要对原始表进行任何修改,以便在代码中重复(违反 DRY 原则)。在这种情况下,您可以根据需要轻松设置字体。
  2. 另一种是从HtmlEngine提取的HTML中提取PDF。这听起来可能有点复杂和复杂(确实如此),但是,工作解决方案更加灵活和通用。不久前我自己遇到了特殊字符的挣扎,并决定在 stackoverflow 上的其他类似解决方案下发布一个有点完整的解决方案:https: //stackoverflow.com/a/24587745/1138663