Java PDF Conversion - Frequently Asked Questions

This page lists the frequently asked questions we have encountered about using JPDF2HTML to convert PDF files.

If you have a question which is not answered here, please ask in the support forum and we will try to answer it.

The FAQ's are organized into the following sections:

General questions HTML5 output questions Image questions Font questions

General Questions

Q.How do I run the software directly from the jar?

You can run the PDF to HTML5 conversion program directly from the jar - for example if you wanted to use it from some other language or script.

To run it, use:

java -jar $path_to_jar/JPDF2HTML.jar $dir_with_pdf_files $output

$path_to_jar is the location of the jar, $dir_with_pdf_files is the location of the PDF files to convert, and $output is where to place the converted files. If a directory does not exist it will be created. You may also want to enable JAI and increase the memory used.

Here is my test example:

java -Xmx512M -Dorg.jpedal.jai=true -jar JPDF2HTML.jar /Users/markee/html/ /Users/markee/abc/


Q.How do I include the software in my Java code?

There is a documented Java PDF to HTML5 example written which is included in the jar. Click here to view the source code.

Click here to view the key code


Q.How do I find the version number of the software?

You can see the version number of the PDF to HTML5 library by running the ExtractPagesAsHTML with no parameters or you can access it directly from the static string variable HTMLDisplay.HTMLversion

Go back to List

 

HTML5 output related questions

Q.How do I adjust the scaling of the output files?

By default, the HTML page will be the dimensions of the PDF. You can scale this (making it bigger or smaller) with the JVM command-line option -Dorg.jpedal.pdf2html.scaling="1.5" (value needs to be a positive float number). Or you can edit ExtractPagesAsHTML example.

 

Q.Can I alter the name of the first HTML file to something other than 1.html?

Yes. The JVM command-line option -Dorg.jpedal.pdf2html.firstPageName="index" would call the page index.html. Or you can edit ExtractPagesAsHTML example.

 

Q.How do I change/remove the navigation bar from my output?

A navigation bar is added to the HTML output to allow the user to change pages. By default this is implemented in CSS. You can also have a version with images (and replace images in the icons directory with your own versions. It is set using the JVM option -Dorg.jpedal.pdf2html.addNavBar which can have the values css, images and none.

If you do not want this to appear, you can remove it with JVM command-line option -Dorg.jpedal.pdf2html.addNavBar="none". Or you can edit ExtractPagesAsHTML example.

There is a blog article explaining the navigation options in more detail here.

 

Q.How is the HTML Output organized?

A single PDF file can contain multiple files and embed all its data internally. By contrast, HTML5 works differently. So for a PDF file called FILENAME.pdf a set of files will be created (one for each page) called FILENAMEpage1.html, FILENAMEpage2.html, etc. If the page contains images or uses CSS, the additional files will be called FILENAMEpage1, FILENAMEpage2. All the links are relative so you can copy the files to another location.

 

Q.How do I convert multiple page PDF's to ONE Single HTML5 page?

A single page output has been added to the HTML5 output of our PDF to HTML5 library to allow for users to be able to print all the PDF pages onto one HTML5 page. This is disabled by default. However if you want a single page output, you can edit the ExtractPagesAsHTML example.

 

Q.How do I adjust the Image size/quality of output?

By default we down-sample images to reduce size. You can use the original (often larger/high quality image) with the JVM flag
-Dorg.jpedal.pdf2html.keepOriginalImage="true" This will result in larger images and need more memory. For finer control you can implement your own version of the OutputImageController interface

Go back to List

 

Image related questions

Q.How do I handle PDF files with JPEG2000 images?

To handle JPEG2000 images, you will need to install the free JAI libraries (jai_core.jar, jai_codec.jar, jai_imageio.jar). Here is how you would run the software (you may need to add the full path to the jar file).

 

java -Xmx512M -Dorg.jpedal.jai=true
-cp jpdf2html.jar:jai_core.jar:jai_codec.jar:jai_imageio.jar
org/jpedal/examples/html/ExtractPagesAsHTML
/Users/markee/Desktop/html/ /Users/markee/Desktop/abc/

 

If you do not have these and run the software on a PDF with JPEG2000 images, you may get Exceptions in console or log output telling you JPEG2000 is not set up and the images will not appear.

 

Go back to List

 

Font related questions

Q.How do I extract and reuse Embedded Truetype fonts?

The PRO version will extract embedded truetype fonts and reuse. This is enabled by default in the example code/default jar. You can also enable it with the JVM option -DFontMode="EMBED_ALL"

 

Q.How do I extract and reuse Embedded Type1 fonts

The PRO version will extract embedded Type1 fonts and convert to OpenType fonts. This is the first release so currently disabled by default in the example code/default jar. You can also enable it with the JVM option -Dorg.jpedal.pdf2html.convertOTFFonts="true"

Go back to List