Extract Text from PDF in the Cloud
Extract Text from PDF using PHP, Ruby, C#, NodeJS, Python or JavaScript
Extract Text from PDF with a simple cloud API
You might know JPedal as a PDF SDK for Java developers, but did you know it is also possible to extract text from PDF files using JPedal from other languages such as PHP, Ruby, C#, NodeJS, Python or JavaScript? We offer a monthly subscription to access our cloud server, or alternatively JPedal is also available to license to run on your own servers.
Host your own PDF Text Extraction API by deploying JPedal as a web-application via Docker or Java Application Server (such as Tomcat or Jetty) to provide a simple REST API which can be accessed from any language. JPedal also integrates with LibreOffice to provide a complete solution for rendering and extracting content from PDF, Word, Powerpoint and Excel.
We also have tutorials for deploying JPedal to a Java Application Server and using Docker to host your own JPedal cloud API, as well as more on using the PDF Text Extraction API.
Quick Start Trial Guide
Step 1: Sign up for your free trial token
Step 2: Run our simple example code
PDF Text Extraction Cloud API Features
Extract Plain Text
Extract the textual content from PDF files as plain text with content encoding handled for you.
Extract Structured Text
If present, JPedal will extract and convert structured content into XML.
Extract Wordlist
Extract the individual words on the page with coordinates of their bounding box.
Simple
The REST API is easy to access from any language using our open source clients and simple example code.
Flexible
Extracting text from PDF is easy to integrate into even the most complex of systems.
Easy
Subscribe to access our cloud server, or alternatively host your own server using our docker image.
Sign up for your free trial token
This is a 14-day free trial, no credit card required.
Run the example code
Simply choose your required language and run the example code.
Extract Text from PDF using PHP
Get started with the following steps:
- Sign up for your free trial token
- Ensure PHP 5.6 (or higher) and composer is installed
- Import the client by running:
composer require idrsolutions/idrsolutions-php-client
- Run the example code on the right
<?php
require_once __DIR__ . "/PATH/TO/vendor/autoload.php";
use IDRsolutions\IDRCloudClient;
$endpoint = "https://trial.idrsolutions.com/trial/" . IDRCloudClient::INPUT_JPEDAL;
$parameters = array(
'token' => 'YOUR_TRIAL_TOKEN', // Token provided to you via e-mail
'input' => IDRCloudClient::INPUT_UPLOAD,
'file' => 'path/to/file.pdf',
'settings' => '{"mode": "extractText", "type": "plainText"}'
);
$results = IDRCloudClient::convert(array(
'endpoint' => $endpoint,
'parameters' => $parameters
));
IDRCloudClient::downloadOutput($results, 'path/to/outputDir');
echo $results['downloadUrl'];
Extract Text from PDF using Ruby
Get started with the following steps:
- Sign up for your free trial token
- Ensure Ruby 2.0 (or higher) is installed
- Import the client by running:
gem install idr_cloud_client
- Run the example code on the right
require 'idr_cloud_client'
client = IDRCloudClient.new('https://trial.idrsolutions.com/trial/' + IDRCloudClient::JPEDAL)
conversion_results = client.convert(
input: IDRCloudClient::UPLOAD,
file: 'path/to/file.pdf',
token: 'YOUR_TRIAL_TOKEN', # Token provided to you via e-mail
settings: '{"mode": "extractText", "type": "plainText"}'
)
client.download_result(conversion_results, 'path/to/outputDir')
puts 'Converted: ' + conversion_results['downloadUrl']
Extract Text from PDF using C#
Get started with the following steps:
- Sign up for your free trial token
- Ensure .NET 2.0 (or higher) and Nuget is installed
- Import the client by running:
nuget install idrsolutions-csharp-client
- Run the example code on the right
using idrsolutions_csharp_client;
var client = new IDRCloudClient("https://trial.idrsolutions.com/trial/" + IDRCloudClient.JPEDAL);
try
{
Dictionary<string, string> parameters = new Dictionary<string, string>
{
["input"] = IDRCloudClient.UPLOAD,
["token"] = "YOUR_TRIAL_TOKEN", // Token provided to you via e-mail
["settings"] = "{\"mode\": \"extractText\", \"type\": \"plainText\"}",
["file"] = "path/to/file.pdf"
};
Dictionary<string, string> conversionResults = client.Convert(parameters);
client.DownloadResult(conversionResults, "path/to/outputDir");
Console.WriteLine("Converted: " + conversionResults["downloadUrl"]);
}
catch (Exception e)
{
Console.WriteLine("File conversion failed: " + e.Message);
}
Extract Text from PDF using Node.JS
Get started with the following steps:
- Sign up for your free trial token
- Ensure Node.js and NPM are installed
- Import the client by running:
npm install --save @idrsolutions/idrcloudclient
- Run the example code on the right
var idrcloudclient = require('@idrsolutions/idrcloudclient');
idrcloudclient.convert({
endpoint: 'https://trial.idrsolutions.com/trial/' + idrcloudclient.JPEDAL,
parameters: {
input: idrcloudclient.UPLOAD,
file: 'path/to/file.pdf',
settings: '{"mode": "extractText", "type": "plainText"}'
token: 'YOUR_TRIAL_TOKEN', // Token provided to you via e-mail
},
failure: function(e) {
console.log(e);
},
progress: function() { },
success: function(e) {
console.log('Converted ' + e.downloadUrl);
}
});
Extract Text from PDF using Python
Get started with the following steps:
- Sign up for your free trial token
- Ensure Python 3 (or higher) and pip is installed
- Import the client by running:
pip install IDRCloudClient
- Run the example code on the right
from IDRSolutions import IDRCloudClient
client = IDRCloudClient('https://trial.idrsolutions.com/trial/' + IDRCloudClient.JPEDAL)
try:
result = client.convert(
input=IDRCloudClient.UPLOAD,
file='path/to/file.pdf',
token='YOUR_TRIAL_TOKEN', # Token provided to you via e-mail
settings='{"mode": "extractText", "type": "plainText"}'
)
outputURL = result['downloadUrl']
client.downloadResult(result, 'path/to/outputDir')
if outputURL is not None:
print("Download URL: " + outputURL)
except Exception as error:
print(error)
Extract Text from PDF using JavaScript
Get started with the following steps:
- Try out the online demo
- View the JavaScript client on GitHub to learn more