Extract Text from PDF in the Cloud

Extract Text from PDF using PHP, Ruby, C#, NodeJS, Python or JavaScript

Extract Text from PDF with a simple cloud API

You might know JPedal as a PDF SDK for Java developers, but did you know it is also possible to extract text from PDF files using JPedal from other languages such as PHP, Ruby, C#, NodeJS, Python or JavaScript? We offer a monthly subscription to access our cloud server, or alternatively JPedal is also available to license to run on your own servers.

Host your own PDF Text Extraction API by deploying JPedal as a web-application via Docker or Java Application Server (such as Tomcat or Jetty) to provide a simple REST API which can be accessed from any language. JPedal also integrates with LibreOffice to provide a complete solution for rendering and extracting content from PDF, Word, Powerpoint and Excel.

Quick Start Trial Guide

Step 1: Sign up for your free trial token

Step 2: Run our simple example code

PDF Text Extraction Cloud API Features

Extract Plain Text

Extract the textual content from PDF files as plain text with content encoding handled for you.

Extract Structured Text

If present, JPedal will extract and convert structured content into XML.

Extract Wordlist

Extract the individual words on the page with coordinates of their bounding box.

Simple

The REST API is easy to access from any language using our open source clients and simple example code.

Flexible

Extracting text from PDF is easy to integrate into even the most complex of systems.

Easy

Subscribe to access our cloud server, or alternatively host your own server using our docker image.

Sign up for your free trial token

This is a 14-day free trial, no credit card required.

Run the example code

Simply choose your required language and run the example code.

Extract Text from PDF using PHP

Get started with the following steps:

  1. Sign up for your free trial token
  2. Ensure PHP 5.6 (or higher) and composer is installed
  3. Import the client by running: composer require idrsolutions/idrsolutions-php-client
  4. Run the example code on the right
<?php

require_once __DIR__ . "/PATH/TO/vendor/autoload.php";

use IDRsolutions\IDRCloudClient;

$endpoint = "https://trial.idrsolutions.com/trial/" . IDRCloudClient::INPUT_JPEDAL;
$parameters = array(
    'token' => 'YOUR_TRIAL_TOKEN', // Token provided to you via e-mail
    'input' => IDRCloudClient::INPUT_UPLOAD,
    'file' => 'path/to/file.pdf',
    'settings' => '{"mode": "extractText", "type": "plainText"}'
);

$results = IDRCloudClient::convert(array(
    'endpoint' => $endpoint,
    'parameters' => $parameters
));

IDRCloudClient::downloadOutput($results, 'path/to/outputDir');

echo $results['downloadUrl'];

Extract Text from PDF using Ruby

Get started with the following steps:

  1. Sign up for your free trial token
  2. Ensure Ruby 2.0 (or higher) is installed
  3. Import the client by running: gem install idr_cloud_client
  4. Run the example code on the right
require 'idr_cloud_client'

client = IDRCloudClient.new('https://trial.idrsolutions.com/trial/' + IDRCloudClient::JPEDAL)

conversion_results = client.convert(
    input: IDRCloudClient::UPLOAD,
    file: 'path/to/file.pdf',
    token: 'YOUR_TRIAL_TOKEN', # Token provided to you via e-mail
    settings: '{"mode": "extractText", "type": "plainText"}'
)

client.download_result(conversion_results, 'path/to/outputDir')

puts 'Converted: ' + conversion_results['downloadUrl']

Extract Text from PDF using C#

Get started with the following steps:

  1. Sign up for your free trial token
  2. Ensure .NET 2.0 (or higher) and Nuget is installed
  3. Import the client by running: nuget install idrsolutions-csharp-client
  4. Run the example code on the right
using idrsolutions_csharp_client;
var client = new IDRCloudClient("https://trial.idrsolutions.com/trial/" + IDRCloudClient.JPEDAL);

try
{
    Dictionary<string, string> parameters = new Dictionary<string, string>
    {
        ["input"] = IDRCloudClient.UPLOAD,
        ["token"] = "YOUR_TRIAL_TOKEN", // Token provided to you via e-mail
        ["settings"] = "{\"mode\": \"extractText\", \"type\": \"plainText\"}",
        ["file"] = "path/to/file.pdf"
    };

    Dictionary<string, string> conversionResults = client.Convert(parameters);

    client.DownloadResult(conversionResults, "path/to/outputDir");

    Console.WriteLine("Converted: " + conversionResults["downloadUrl"]);
}
catch (Exception e)
{
    Console.WriteLine("File conversion failed: " + e.Message);
}

Extract Text from PDF using Node.JS

Get started with the following steps:

  1. Sign up for your free trial token
  2. Ensure Node.js and NPM are installed
  3. Import the client by running: npm install --save @idrsolutions/idrcloudclient
  4. Run the example code on the right
var idrcloudclient = require('@idrsolutions/idrcloudclient');

idrcloudclient.convert({
    endpoint: 'https://trial.idrsolutions.com/trial/' + idrcloudclient.JPEDAL,
    parameters: {
        input: idrcloudclient.UPLOAD,
        file: 'path/to/file.pdf',
        settings: '{"mode": "extractText", "type": "plainText"}'
        token: 'YOUR_TRIAL_TOKEN', // Token provided to you via e-mail
    },

    failure: function(e) {
        console.log(e);
    },
    progress: function() { },
    success: function(e) {
        console.log('Converted ' + e.downloadUrl);
    }
});

Extract Text from PDF using Python

Get started with the following steps:

  1. Sign up for your free trial token
  2. Ensure Python 3 (or higher) and pip is installed
  3. Import the client by running: pip install IDRCloudClient
  4. Run the example code on the right
from IDRSolutions import IDRCloudClient

client = IDRCloudClient('https://trial.idrsolutions.com/trial/' + IDRCloudClient.JPEDAL)
try:
    result = client.convert(
        input=IDRCloudClient.UPLOAD,
        file='path/to/file.pdf',
        token='YOUR_TRIAL_TOKEN', # Token provided to you via e-mail
        settings='{"mode": "extractText", "type": "plainText"}'
    )

    outputURL = result['downloadUrl']

    client.downloadResult(result, 'path/to/outputDir')

    if outputURL is not None:
        print("Download URL: " + outputURL)

except Exception as error:
    print(error)

Extract Text from PDF using JavaScript

Get started with the following steps:

  1. Try out the online demo
  2. View the JavaScript client on GitHub to learn more