Running the PDF to HTML5 Converter

view-example-source-code

[Click to edit the title]

Help & Code Samples

Keep updated on changes with our RSS feed

How do I run the software directly from the jar?

The single jar file can be placed anywhere - you will just need to ensure Java is installed (we recommend Java 7). You will also need to be able to write to the output location.

You can run the PDF to HTML5 conversion program directly from the jar - for example if you wanted to use it from some other language or script.

 

Quickstart (if your PDF files do not contain JPEG2000 or Tif images)

To run it, use:

java -jar $path_to_jar/JPDF2HTML.jar $dir_with_pdf_files $output

$path_to_jar is the location of the jar, $dir_with_pdf_files is the location of the PDF files to convert, and $output is where to place the converted files. If a directory does not exist it will be created.

 

All PDF files

You will need the additional JAI jars - you can download them from here.

 

java  -Dorg.jpedal.jai=true -cp $path_to_jar/JPDF2HTML.jar:$path_to_jai/jai_codec.jar:$path_to_jai/jai_core.jar.jar:$path_to_jai/imageio.jar org/jpedal/examples/html/ExtractPagesAsHTML $dir_with_pdf_files $output

  

Important note for Windows developers: separate the jars with a semi-colon ; rather than a colon :

 

My Example

Here is my test example:

java -Xmx512M -Dorg.jpedal.jai=true -jar JPDF2HTML.jar /Users/markee/pdfs/ /Users/markee/output/

 

You may also want to increase the memory used.

 

What sort of output will I get? 

The code can generate several types of output including:-

  • a total conversion of the PDF to HTML5 (with text using embedded fonts) and shapes rendered on Canvas (TEXT_AS_TEXT) - default setting
  • All content rendered to Canvas (TEXT_AS_SHAPE).
  • All content rendered to an image with visible text to allow text selection (TEXT_VISIBLE_ON_IMAGE)
  • All content rendered to an image with invisible text to allow text selection (TEXT_INVISIBLE_ON_IMAGE)
You can set them in the ExtractPagesAsHTML example or see them all by setting the JVM flag to
  • -Dorg.jpedal.pdf2html.textMode="all" (all modes)
  • -Dorg.jpedal.pdf2html.textMode="shape"
  • -Dorg.jpedal.pdf2html.textMode="visible"
  • -Dorg.jpedal.pdf2html.textMode="invisible"

Do I need any additional jars? 

Possibly. If the PDF is encrypted, or contains Tiff data. Full details are  here.

 

How do I generate SVG instead? 

There are separate tutorial here.

 

 

How do I include the HTML5 conversion software in my Java code?

There is a documented Java PDF to HTML5 example written which is included in the jar. Click to view the source code.

Click here to view the key code 

 

 

How do I get assistance?

Our developers are on the forums to answer your questions

 

Will you do the coding for us?

We are happy to provide coding on a commercial basis only. 

 

How do I find the version number of the software?

You can see the version number of the PDF to HTML5 library by running the ExtractPagesAsHTML with no parameters or you can access it directly from the static string variableHTMLDisplay.HTMLversion