- Product
- Download
- Documentation
- FAQ
Storypad 4.0 - PDF content extraction engine
Storypad 4 is the evolution of an enterprise tool designed in 1999 primarily for newspapers. Storypad converts a page with multiple stories into individual stories, allowing each individual story to be further broken down into categories such as headline, standfirst, text, by-line, image and caption. The guidelines for selecting and breaking up the story are user definable as are the categories contained in each story.
Categories
Once the story has been identified and broken into components, this information can be submitted into a content management system or saved to file, allowing the story content to be re-used. Storypad provides an interface to the required method of saving.
The problem with extracting from pdfs is that generally they contain no structure - no end of line or paragraph markers, no spaces, no column or table information. They are basically a series of draw commands that don't necessarily draw in a predefined order.
Algorithms
To circumvent the lack of structure in most pdf files, Storypad comes with a number of algorithms, which allow the program to try and group like material together. For example, the algorithm "Vertical columns" attempts to group text in multiple columns correctly.
Creating an algorithm to deal with every situation is impossible. What Storypad does is to provide a selection of algorithms that deal with common situations. More importantly, Storypad allows you to "undo" the algorithm if it makes an error and allows the user to manually extract the information needed.
The Storypad software operates under a GPL licence, and sits on top of any version of the JPedal developer libraries. JPedal has a dual commercial/GPL licence - if you abide by the terms of the GPL licence conditions it is free to use. If not, you need to purchase a licence.
Storypad Download
Storypad is an open, industrial strength PDF content extraction tool. It is designed to provide a framework to implement tailored extraction solutions, and ONE such example is also featured in the documentation.
To get the most out of Storypad, we recommend you read and follow ALL the following steps:
1. System requirements

Storypad requires Java version 1.5.x or greater to be installed on your system. Storypad does not currently work with Java 1.6.x.We recommend 128Mb of free physical RAM. Graphics intensive pdf files may require more memory than this.Storypad will run on any machine that runs Java. The faster your machine, the better Storypad will run.
2. Download the latest version of the Storypad

Storypad is provided as a zipped 8 meg directory containing the required files to run.
Storypad (latest Dev version) Updated 18 February 2008.
3. Download additional files

Download jar with CMaps for CID fonts. If you are accessing CID fonts, JPedal may require an additional jar to be installed.
Disclaimer
In no event shall the author, or any other party who may modify and/or redistribute this program and documentation, be liable for any commercial, special, incidental, or consequential damages arising out of the use or inability to use the program including, but not limited to, loss of data or data being rendered inaccurate or losses sustained by you or losses sustained by third parties or a failure of the program to operate with any other programs, even if you or other parties have been advised of the possibility of such damages.
Storypad Documentation
last updated 13 December 2007
The links beneath expain how to use some of the functionality available in Storypad. If you feel anything is missing, please email us.
Some screenshots include parts of PDFs containing pages which are copyright of News International and reproduced here with permission.
- Grouping - a 'by example' explanation of how to use Storypad's powerful grouping algorithms.
- Split spreads - explains how multiple pages can be opened, and content across pages can be grouped by the user
- Cut objects explains how the user can break an object or grouping, both vertically and horizontally.
| Top
|
| General questions Storypad is designed as a GUI and batch tool to extract content from complex layouts. The level of automation possible will vary with the level of regularity in the layout .
Q. What are its unique selling points? Its flexibility and its open nature. Most extraction tools are very much black box solutions. Storypad has been designed so it can be easily customised, automating where it is economically viable/technically possible, providing a flexibily GUI tool for human interaction in other situations. It is designed to provide a starting point which can be easily customised to fill multiple extraction requirments and it runs on multiple platforms. Having the source code available for Storypad makes it much easier for clients to customise and maintain to meet their requirments.
Q. What do I need to run Storypad?? Storypad is written in Java and requires Java 1.4.x to be installed on your machine. Storypad does NOT currently work when run under Java 1.5. It requires some additional Java libraries which can be downloaded from the downloads page. It also requires a copy of jpedal.jar, which is a commercial library and can be purchased from http://www.jpedal.org - a demo copy can be downloaded from the downloads page.
Q. Are you continuing to develop Storypad? Yes
The source code for Storypad can be downloaded, modified and recompiled under a GPL license If you modify the code we recommend you do it by generating patch files from the original source code so you can easily apply your changes to later releases.
Q. What if I need additional features in Storypad? Storypad is under active development with regular new features. We are also happy to quote on adding enhancements. As the source code for Storypad is available, you can also make your own changes.
Q. What platforms will it run on? Because it is written in pure Java it can run on any Operating System where Java is available. It was developed on several platforms.
Q. Where do I go with any further questions? If you have discovered a bug, fill out our bug report form and let us know. We provide support as a commercial service. You can email us for further details.
Q. Is there a manual I can download? Our Support page contains links to downloadable manuals.
It works on 90% of files we have tested it on. It is possible to produce poor quality PDFs. It is not an OCR tool and some PDF files do not physically contain enough data to extract any meaningful content. On good quality PDF files we have had very good results.
Q. Who is the cute little girl in the picture? That would be Isabella, taken in the heady days of the dot-com boom. |
| Top
|
| Licensing details
Q. What license options are available? Storypad is available under a GPL license. We can produce an LGPL version (which contains additional code and is generally kept more up to date), which is supplied to commercial customers. The GPL license is actually quite restrictive and has a number of requirements. You can find more information about the GPL from the Free Software Foundation website (www.gnu.org) If you do not wish to use the software under the GPL license, you will need an alternative license from us.
Q. Can I use it for educational purposes and non-commercial purposes? The GPL release can be freely used for any purpose if you stick to the GPL license.
There is no support under the terms of the GPL. If you
require support you can purchase it from us. Email us for further information. |
| Top
|
| Extraction questions
Images are converted to sRGB and can be saved in several formats. Text is stored internally as unicode and we have included options to save as XML of text. Font information is included in the XML.
Q. Can extraction be modified? Extraction is implemented through a Writeable interface with the GPL code including an example showing how to write to flat file. We have sample code to make it write to databases, and it could also write across an internet connection.
|
| Top
|
| Configuration questions
Q. Where is the configuration data stored? As a set of XML files in the configuration directory If you delete this a new default one will be created.
Q. Do the values in the token table always get selected? Yes, if you select an object category and the value is saved in the object you selected.
Q. I want to add more values and delete some existing values to the table, can i do that? Yes, to add more values just click on add value and save it to the table. To delete a value first select the value you wish to delete then click on delete value?
Q. Will the token table save the values for each pdf? Yes, it is a global table.
|
| Top
|
| GUI related questions
Refreshes the display and resets any objects selected on the pdf.
Q. What does the debug button do? Debug lets you graphically see how grouping has been applied to the page. You can show you two different types of groupings. First shows you the objects grouped on the pdf. Second grouping is only displayed if you have selected the Times Smart Grouping. This shows the Smart Groupings for the pdf
Q. What does the categorize function actually do? Categorize lets the user define settings for fields using keywords. So if all your bylines started <font face="FranklinGothic2-ExtraCondensed" style="font-size:123pt">By , you could get Storypad to set this field to Byline with the Categorize function.
Q. Can i extract the image(s)? Yes, double click on the image(s) you want to extract. Then Ctrl double click will display the Field Data dialogue for you save this into your file system.
Q. Can I create more then one story on a pdf? You can link several items together to make separate stories, but only one story can be extracted at once. To show all the stories on the pdf select the Smart Times Grouping .
|
| Top
|
| Grouping related questions
Q. How does grouping actually work? PDF consists of essentially unstructured text. Storypad attempts to turn this into useful content by applying 2 strategies. Firstly, the Object grouping routines attempt to convert the fragments into discrete objects. Then the higher level routines attempt to link them together. Because all layouts are different in conflicting ways, we have tried to create a generic set of grouping algorithms which can be tailored to specific situations. The built-in graphical debugging tools make this very easy for regular layouts. Technically grouping is implemented via a Groupable interface and sample implementations are included in the GPL which may work 'out of the box' or form the basis for modifications
Q. How is the difference between merging and linking? When 2 objects are merged together, they are phyically transformed into ONE new object. Linking keeps the objects separate but establishes a link between them so that Storypad treats them as ONE object. Linked objects can easily be broken back into the original objects.
Q. Can you alter the Times Smart grouping to suit our newspaper? Yes, we can on a commercial basis. Or you can modify the code yourself.
Q. What about more simple layouts? Storypad is aimed at complex structures. Jpedal has a number of grouping options - it is recommended you look at those. |
| Top
|
