How do I download the Jar File?
How do I run the service directly from the jar?
You can run the PDF to HTML5 conversion program directly from the jar – for example if you wanted to use it from some other language or script. Please ensure that Java is installed.
Quickstart (if your PDF files do not contain JPEG2000 or Tif images)
To run it, use:
java -jar $path_to_jar/JPDF2HTML.jar $dir_with_pdf_files $output
$path_to_jar is the location of the jar
$dir_with_pdf_files is the location of the PDF files to convert
$output is where to place the converted files. If a directory does not exist it will be created.
All PDF files
You will need the additional JAI jars – you can download them from here.
java -Dorg.jpedal.jai=true -cp $path_to_jar/JPDF2HTML.jar:$path_to_jai/jai_codec.jar:$path_to_jai/jai_core.jar.jar:$path_to_jai/imageio.jar org/jpedal/examples/html/ExtractPagesAsHTML $dir_with_pdf_files $output
Important note for Windows developers: separate the jars with a semi-colon ; rather than a colon :
Here is my test example:
java -Xmx512M -Dorg.jpedal.jai=true -jar JPDF2HTML.jar /Users/markee/pdfs/ /Users/markee/output/
You may also want to increase the memory used.
How are the output settings controlled?
There are a huge number of configuration options that can be used. When running from command line they can be controlled using JVM flags. A list of all JVM flags and their values can be found on the JavaDoc for the HTMLConversionOptions class.
An example conversion setting the Text Mode to svg_realtext is as follows:
java -Dorg.jpedal.pdf2html.textMode=svg_realtext -jar jpdf2html.jar /PDFDir/ /OutputDir/
Do I need any additional jars?
Possibly. If the PDF is encrypted, or contains Tiff data. Full details are here.
How do I find the version number of the software?
You can see the version number of the PDF to HTML5 library by running the ExtractPagesAsHTML with no parameters or you can access it directly from the static string variable HTMLDisplay.HTMLversion