Running PDF to HTML5 Converter

This is aimed at developers with a good knowledge of Java. If you do not know Java, we recommend using the Web service or Online converter.

How do I download the Jar File?

This tutorial needs you to download a single jar. It will work with either the 30 day trial or the full commercial version you can download here.

How do I run the service directly from the jar?

The single jar file can be placed anywhere – you will just need to ensure Java is installed (we recommend Java 7). You will also need to be able to write to the output location.

You can run the PDF to HTML5 conversion program directly from the jar – for example if you wanted to use it from some other language or script.

 

Quickstart (if your PDF files do not contain JPEG2000 or Tif images)

To run it, use:

java -jar $path_to_jar/JPDF2HTML.jar $dir_with_pdf_files $output

$path_to_jar is the location of the jar, $dir_with_pdf_files is the location of the PDF files to convert, and $output is where to place the converted files. If a directory does not exist it will be created.

All PDF files

You will need the additional JAI jars – you can download them from here.

java  -Dorg.jpedal.jai=true -cp $path_to_jar/JPDF2HTML.jar:$path_to_jai/jai_codec.jar:$path_to_jai/jai_core.jar.jar:$path_to_jai/imageio.jar org/jpedal/examples/html/ExtractPagesAsHTML $dir_with_pdf_files $output

Important note for Windows developers: separate the jars with a semi-colon ; rather than a colon :

My Example

Here is my test example:

java -Xmx512M -Dorg.jpedal.jai=true -jar JPDF2HTML.jar /Users/markee/pdfs/ /Users/markee/output/

You may also want to increase the memory used.

What sort of output will I get?

There are a huge number of configuration options that can be used. The most commonly used configuration option is the Text Mode. This controls how text, shapes and images get rendered. The default option for HTML is to use real text (with converted fonts), and to draw shapes and images onto a background image. It is also possible to draw shapes and images as SVG.

The text mode can be controlled within the ExtractPagesAsHTML class, or by using the JVM options -Dorg.jpedal.pdf2html.textMode=

  • image_realtext (default)
  • image_shapetext_selectable (text converted to image with invisible real text on top for selection)
  • image_shapetext_nonselectable (text converted to image)
  • svg_realtext
  • svg_shapetext_selectable
  • svg_shapetext_nonselectable
  • svg_realtext_nofallback
  • svg_shapetext_selectable_nofallback
  • svg_shapetext_nonselectable_nofallback

Do I need any additional jars?

Possibly. If the PDF is encrypted, or contains Tiff data. Full details are  here.

Javadocs

The documentation for the examples and key java classes is available JavaDocs for HTML,SVG converter or for download in standard javadoc format.

All the useful methods are documented. Some methods are required for use internally. They are included for completeness. If in doubt, please contact us for help.

How do I generate SVG instead?

There are separate tutorial here.

How do I include the HTML5 conversion software in my Java code?

There is a documented Java PDF to HTML5 example written which is included in the jar. Click to view the Java example source code.

Click here to view the key code

How do I get assistance?

Our developers are on the forums to answer your questions

Will you do the coding for us?

We are happy to provide coding on a commercial basis only.

How do I find the version number of the software?

You can see the version number of the PDF to HTML5 library by running the ExtractPagesAsHTML with no parameters or you can access it directly from the static string variableHTMLDisplay.HTMLversion