Java中使用Flying Saucer，OpenPDF将HTML转换为PDF-IGI

时间：2020-01-09 10:35:24 　来源:igfitidea点击:

在本教程中，我们将了解如何使用Flying Saucer，OpenPDF和jsoup在Java中将HTML转换为PDF。

有关使用PDFBox将HTML转换为PDF的信息，请检查这篇文章使用Openhtmltopdf，PDFBox将Java中的HTML转换为PDF。

使用Flying Saucer将HTML转换为PDF –工作原理

Flying Saucer呈现格式良好的XML，这意味着它将XML文件作为输入，使用CSS应用格式和样式，并生成该XML的呈现形式作为输出。因此，将HTML转换为PDF的步骤如下：

第一步是确保我们使用jsoup将HTML转换为XHTML的格式正确的HTML。
Flying Saucer生成XHTML和CSS的呈现形式。
OpenPDF用于从该表示形式生成PDF文档。

OpenPDF是iText版本4的分支，它是具有LGPL和MPL许可证的开源软件。在这篇文章中阅读有关OpenPDF的更多信息https://theitroad.com/java-programs/generating-pdf-java-using-openpdf-tutorial/

Maven依赖

jsoup和Flying Saucer的Apache Maven依赖关系如下

<dependency>
  <groupId>org.jsoup</groupId>
  <artifactId>jsoup</artifactId>
  <version>1.13.1</version>
</dependency>

<dependency>
  <groupId>org.xhtmlrenderer</groupId>
  <artifactId>flying-saucer-pdf-openpdf</artifactId>
  <version>9.1.20</version>
</dependency>
<!-- Dependency for Apache commons-io -->
<dependency>
  <groupId>commons-io</groupId>
  <artifactId>commons-io</artifactId>
  <version>2.6</version>
</dependency>

提到的Flying Saucer依赖项将获得OpenPDF所需的jar以及Flying Saucer核心(flying-saucer-core-9.1.20.jar)。

使用Flying Saucer和OpenPDF Java程序将HTML转换为PDF

将HTML转换为PDF时，我遇到的三个问题是

如何使用<img src ="" ..>标签在PDF中显示以HTML格式显示的图像。
如何添加任何特定的网络字体。
如何确保HTML中使用的外部CSS也可以用来设置生成的PDF的样式。

示例程序使用的文件夹结构如下所示。在OpenPDF文件夹中，我们有HTML文件，真型字体文件和png图像文件，而OpenPDF / css文件夹中有css文件。

-OpenPDF
 MyPage.html
 Gabriola.ttf
 Image OpenPDF.png
--css
  mystyles.css

MyPage.html

<html lang="en">
  <head>
    <title>MyPage</title>  
    <style type="text/css">
      body{background-color: powderblue;}
    </style>
    <link href="css/mystyles.css" rel="stylesheet" >
  </head>
  <body>
    <h1>Convert HTML to PDF</h1>
    <p>Here is an embedded image</p>
    <img src="F:\theitroad\Java\Java Programs\PDF using Java\OpenPDF\Image OpenPDF.png" width="250" height="150">
    <p style="color:red">Styled text using Inline CSS</p>
    <i>This is italicised text</i>
    <p class="fontclass">This text uses the styling from font face font</p>
    <p class="myclass">This text uses the styling from external CSS class</p>
  </body>
</html>

mystyles.css
在css中，@ font-face规则用于指定字体和可以找到它的URL。使用@page规则指定在打印文档时要使用的CSS属性。

@font-face {
  font-family: myFont;
  src: url("../Gabriola.ttf");
}
.fontclass{
  font-family: myFont;
}
@Page {
  size: 8.5in 11in;
  margin: 1in;
}
.myclass{
  font-family: Helvetica, sans-serif;
  font-size:25;
  font-weight: normal;
  color: blue;
}

这就是在Chrome浏览器中呈现HTML的方式。

现在我们的工作是编写一个Java程序，通过使用相同的图像源，使用相同的外部CSS并添加CSS @ font-face规则中使用的字体，可以将该HTML转换为PDF。

为了使图像在转换为PDF时正常工作，对我而言，有效的方法是实现自己的ReplacedElementFactory，该工厂将图像转换为字节，然后使用它来创建ImageElement。这里有一个讨论。

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import org.apache.commons.io.IOUtils;
import org.w3c.dom.Element;
import org.xhtmlrenderer.extend.FSImage;
import org.xhtmlrenderer.extend.ReplacedElement;
import org.xhtmlrenderer.extend.ReplacedElementFactory;
import org.xhtmlrenderer.extend.UserAgentCallback;
import org.xhtmlrenderer.layout.LayoutContext;
import org.xhtmlrenderer.pdf.ITextFSImage;
import org.xhtmlrenderer.pdf.ITextImageElement;
import org.xhtmlrenderer.render.BlockBox;
import org.xhtmlrenderer.simple.extend.FormSubmissionListener;
import com.lowagie.text.BadElementException;
import com.lowagie.text.Image;

public class ImageReplacedElementFactory implements ReplacedElementFactory {

  @Override
  public ReplacedElement createReplacedElement(LayoutContext c, BlockBox box, UserAgentCallback uac, int cssWidth,
      int cssHeight) {
    Element e = box.getElement();
    if (e == null) {
      return null;
    }
    String nodeName = e.getNodeName();
    if (nodeName.equals("img")) {
      String attribute = e.getAttribute("src");
      FSImage fsImage;
      try {
        fsImage = imageForPDF(attribute, uac);
      } catch (BadElementException e1) {
        fsImage = null;
      } catch (IOException e1) {
        fsImage = null;
      }
      if (fsImage != null) {
        if (cssWidth != -1 || cssHeight != -1) {
          //System.out.println("scaling");
          fsImage.scale(cssWidth, cssHeight);
        }else {
          fsImage.scale(250, 150);
        }
        return new ITextImageElement(fsImage);
      }
    }
    return null;
  }
  
  protected FSImage imageForPDF(String attribute, UserAgentCallback uac) throws IOException, BadElementException {
    InputStream input = null;
    FSImage fsImage;     
    input = new FileInputStream(attribute);
    final byte[] bytes = IOUtils.toByteArray(input);
    final Image image = Image.getInstance(bytes);
    fsImage = new ITextFSImage(image);
    return fsImage;
  }
	 
  @Override
  public void reset() {
    // TODO Auto-generated method stub
  }

  @Override
  public void remove(Element e) {
    // TODO Auto-generated method stub		
  }

  @Override
  public void setFormSubmissionListener(FormSubmissionListener listener) {
    // TODO Auto-generated method stub		
  }
}

以下Java程序用于使用HTML作为源来生成PDF

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.FileSystems;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.xhtmlrenderer.layout.SharedContext;
import org.xhtmlrenderer.pdf.ITextRenderer;

public class HTMLToPDF {

  public static void main(String[] args) {
    try {
      // Source HTML file
      File inputHTML = new File("F:\theitroad\Java\Java Programs\PDF using Java\OpenPDF\MyPage.html");
      // Generated PDF file name
      File outputPdf = new File("F:\theitroad\Java\Java Programs\PDF using Java\OpenPDF\Output.pdf");
      //Convert HTML to XHTML
      String xhtml = htmlToXhtml(inputHTML);
      System.out.println("Converting to PDF...");
      xhtmlToPdf(xhtml, outputPdf);
      
    } catch (IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
  }
  
  private static String htmlToXhtml(File inputHTML) throws IOException {
    Document document = Jsoup.parse(inputHTML, "UTF-8");
    System.out.println("parsing ...");
    document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
    System.out.println("parsing done ...");
    return document.html();
  }
  
  private static void xhtmlToPdf(String xhtml, File outputPdf) throws IOException {
    ITextRenderer renderer = new ITextRenderer();	
    SharedContext sharedContext = renderer.getSharedContext();
    sharedContext.setPrint(true);
    sharedContext.setInteractive(false);
    sharedContext.setReplacedElementFactory(new ImageReplacedElementFactory());
    sharedContext.getTextRenderer().setSmoothingThreshold(0);
    renderer.getFontResolver().addFont("F:\theitroad\Java\Java Programs\PDF using Java\OpenPDF\Gabriola.ttf", true);
    String baseUrl = FileSystems.getDefault()
                                .getPath("F:\", "theitroad\Java\", "Java Programs\PDF using Java\OpenPDF")
                                .toUri()
                                .toURL()
                                .toString();
    renderer.setDocumentFromString(xhtml, baseUrl);
    renderer.layout();
    OutputStream outputStream = new FileOutputStream(outputPdf);
    renderer.createPDF(outputStream);
    System.out.println("PDF creation completed");
    // put this in finally
    outputStream.close();
  }
}

在程序中，要注意的一些重要点是：

1.sharedContext.setReplacedElementFactory(new ImageReplacedElementFactory());设置ReplacedElementFactory的自定义实现。
2.在方法renderer.setDocumentFromString(xhtml，baseUrl);中，将baseURL作为第二个参数传递。使用此语句创建URL

String baseUrl = FileSystems.getDefault().getPath("F:\", "theitroad\Java\", "Java Programs\PDF using Java\OpenPDF").toUri().toURL().toString();

3.如果我们在HTML中注意到css的路径是相对路径。通过设置第二点中给出的baseURL，将能够解析此相对路径，这有助于在生成PDF时使用外部CSS。
4.使用此语句注册其他字体

renderer.getFontResolver().addFont("F:\theitroad\Java\Java Programs\PDF using Java\OpenPDF\Gabriola.ttf", true);

生成的PDF
参考：https://flyingsaucerproject.github.io/flyingsaucer/r8/guide/users-guide-R8.html

Java中使用Flying Saucer，OpenPDF将HTML转换为PDF

使用Flying Saucer将HTML转换为PDF –工作原理

Maven依赖

使用Flying Saucer和OpenPDF Java程序将HTML转换为PDF

相关推荐

最近更新

标签

Java中使用Flying Saucer，OpenPDF将HTML转换为PDF

使用Flying Saucer将HTML转换为PDF –工作原理

Maven依赖

使用Flying Saucer和OpenPDF Java程序将HTML转换为PDF

相关推荐

JDBC驱动程序接口

JDBC连接接口

JDBC语句接口

JDBC PreparedStatement接口

相关推荐

最近更新

标签