Java code example
We provide a simple example in the jar that will allow you to search for a given term within a provided pdf. The example is basic and allows for searching an entire pdf for a single term. Within the class are additional constructors which you can use to give additional functionality such as searching a single page and / or searching a specific area on the pdf page.
Automated PDF Text search example
- FindTextInRectangle will find locations of a search term within a PDF.
A note on co-ordinates
Examples use the PDF co-ordinates which start at bottom left of page and run up the page. This is the opposite of Java (which run from top left down page).
Interactive PDF search in the PDF Viewer
The built-in PDF Viewer offer powerful PDF search capabilities as standard.
The interactive PDF search function allows you to search either the current page or the entire PDF document for occurrences of a word or a phrase. It supports the 3 GUI layouts of search which Adobe has offered in releases of Acrobat using an Options setting. This functionality can also be accessed from your own code
Preforming a search from within the Viewer class is done via the Object executeCommand(int commandID, Object args) usingCommands.FIND as the int value. For details of how to preform a search from the Viewer class please check the tutorial how to access PDF Viewer functions
The search results can be retrieved using the method getSearchResults() in the Viewer class. The results are returned as a SearchList. If the method is called half way through a search the search results at that point will be returned and the search will continue. To ensure the search was complete you can call getStatus() from within the SearchList and compare with the following values.
- public final static int NO_RESULTS_FOUND = 1;
- public final static int SEARCH_COMPLETE_SUCCESSFULLY = 2;
- public final static int SEARCH_INCOMPLETE = 4;
- public final static int SEARCH_PRODUCED_ERROR = 8;
PDF Search Access from PdfGroupingAlgorithms
All the searching and text extraction algorithms are located in the PdfGroupingAlgorithms class. The public facing methods in this class that can be used for searching are findText, findMultipleTermsInRectangle, and findMultipleTermsInRectangleWithMatchingTeasers.
All four methods take an integer value describing the type of search to conduct. This value is made up of either one, or a combination of more than one value contained in the SearchType class. These values are
- public final static int DEFAULT = 0;
- public final static int WHOLE_WORDS_ONLY = 1;
- public final static int CASE_SENSITIVE = 2;
- public final static int FIND_FIRST_OCCURANCE_ONLY = 4;
- public final static int MUTLI_LINE_RESULTS = 8;
- public final static int HIGHLIGHT_ALL_RESULTS = 16;
- public final static int USE_REGULAR_EXPRESSIONS= 32;
These values can be combined by using the bitwise or operator. For example,
int searchType = SearchType.WHOLE_WORDS_ONLY | SearchType.CASE_SENSITIVE;
All four PDF search methods can find results split across multiple lines using the SearchType.MUTLI_LINE_RESULTS values, findTextInRectangleAcrossLines will find results across lines regardless.
Overview of PDF Searching Methods in PdfGroupingAlgorithms
float findText(int x1, int y1, int x2, int y2, String terms, int searchType)
Algorithm to find an array of terms, textValue, within a given set of coordinates, x1, y1, x2, y2, using the given search type, described above. Returned is an array of coordinates for found text returned as a float containing the four coordiantes used to define a rectangle and a fifth value which is used to indicate that the next value is a continuation of this result (only if fifth value is -101).
List findMultipleTermsInRectangle(int x1, int y1, int x2, int y2, final int rotation, int page_number, String terms, boolean orderResults, int searchType, SearchListener listener)
Algorithm to find multiple text terms in a given rectangle x1, y1, x2, y2 on page_number. If orderResults is true then the list that is returned is ordered to return the resulting rectangles in a logical order descending down the page, if false, rectangles for multiple terms are grouped together. A listener - an implementation of SearchListener is required, this is to enable searching to be cancelled. Returned is a Listof objects that can contain a combination of Rectangles and Rectangle describing the locations of found text.
SortedMap findMultipleTermsInRectangleWithMatchingTeasers(int x1, int y1, int x2, int y2, final int rotation, int page_number, String terms, int searchType, SearchListener listener)
Algorithm to find multiple text terms in a given rectangle x1, y1, x2, y2 on page_number with matching teasers. A listener - an implementation of SearchListener is required, this is to enable searching to be cancelled. Returned is a SortedMap containing a collection of objects which are a combination of Rectangles and Rectangle describing the location of found text, mapped to a String which is the matching teaser.