Home > Forum Home > Developing Business Administration Solutions > Importing Data from PDF Files Share

Importing Data from PDF Files

Excel Help for Importing Data From Pdf Files in Developing Business Administration Solutions


Forum TopicPost Reply Login

Importing Data From Pdf Files

Rate this:
(4.2/5 from 24 votes)
Happy Business Spreadsheets has developed a free Excel program to extract and import PDF data into Excel which can be downloaded and used without restriction.

There is a common need to extract and import specific data from PDF files into Excel. Since Excel does not natively support the reading of PDF content, utilities are needed to convert the PDF file content for the Excel format. Several commercial applications accomplish this; however it is often the case where only specific data is required to be imported from multiple PDF files into one structured format.

We created such an application by using VBA code in conjunction with an open source PDF to Text conversion utility, which can be found at Foolabs.

[Download the free PDF data import Excel program here]

The program relies on the conversion utility (included in the download) and all PDF files to reside in the same directory as the Excel application. Text or data to extract are defined in the Control sheet by specifying start text, end text and multiple replacements routines with wildcard support. This enables flexibility to obtain comparable data from multiple PDF files based on patterns independent of different PDF file structures.

As many extraction rules as required can be set in order to create a table of information imported by extraction rule and PDF file name. Information on how to set up rules is available within the Excel application with a help icon and cell comments. The VBA code is commented and open for modification.

Any improvements or new features to the code are welcome to be posted here so that we can update the download version to the benefit of everyone.
 Excel Business Forums Administrator
 Posted by on
 
Replies - Displaying 51 to 60 of 88Order Replies By: Most Recent | Chronological | Highest Rated
Applaud
Rate this:
(3/5 from 1 vote)
  • Should they be left blank or filled with the previous match for that pattern.
Like the previous versions, if the new file didn't found any match it would appear as blank. I think thats a good aproach since a missing item often mean its "blank".
  • What determines whether a whole set should be deemed ready for output? At the moment we're looking at each time the first pattern is found, we look to see if all patterns have been found already, output the results, reset the cache and continue in the file.  Maybe this should be the last pattern.
I guess that the first pattern could work, if it finds a new first pattern or EOF consider it to be a complete record and starts a new one.

 Alexandre
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Hello!
Thanks for this program, it runs pretty great. But the downloadable version doesn't seem to  be the newest version as describe in the comments. Could anybody verify this? And if not, is the newest  version still available?

Also one more question:
How much has been changed in the Xpdf program? or the program reliace on the original unedited version?

Thanks in advance,
Johhny!
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Thanks for your reply!
I was looking through the comments, and it has been mentioned that the pipe character has been replaced with ^^^. Which does not seem to be the case for the version mentioned above. (Eventhough this is not very difficult to change..).

So I thought it was not the updated version, and was afraid it might be missing some other functions.

Thanks!
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Really great tool. My case is a little bit different. I need to rename lots of pdf files based on the first line. How should I modify the code to have as START point the beginning of the pdf as it is not always the same? Thank you.
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Thanks again.
And yes there is more text on that same line after the %, i.e., the % is not at the end of the line.
Could you please elaborate more on where to insert this formula. I appreciate it.
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Never mind, I see what you're saying. But is there a way to do it without this formula. it will save me a step.

Please let me know. Thanks
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Hi, this is very neat and quite useful. Nice work.

I was wondering, is there was someway to make this work with interactive Pdf Forms? The idea is to create a fillable form (I use Foxit Phantom) and distribute to users in the ubiquitous pdf format. They fill it and return it and then we extract the data into an excel sheet.

This would be a great addition if possible. Specially because the great thing about your tool is that it skips the need for the user to convert the file to a .xls manually. I cannot seem to find any way to do this, without actually having my end user first convert to xls and then import. When the forms number in the thousands, that's just impracticle.

Any help would be greatly appreciated. 
 "In all things, be men"
 Posted by on
Fedup
Rate this:
(3/5 from 1 vote)
You can attach a sample PDF by replying to the notification email with details of desired extraction of this post and we can look into it.
 Excel Business Forums Administrator
 Posted by on
Oops
Rate this:
(3/5 from 1 vote)
There are several issues to address here so let's cover them individually.
  1. The date formatting from number ranges.  One way is to retain some common preceding text such as "Range:". If this is not possible then we can add some replacement pairs from 1 to 9 with adding an apostrophe to stop Excel converting the text to a date.
  2. Extra spaces. We can remove multiple spaces by using replacement pairs specifying a double space to be found and a single space to be the replacement.
  3. The name of the school.  Sometimes we can get the initial text in a row by specifying a common text which occurs at the end of the row above. Sometimes this can be further above with wildcard replacement pairs to remove everything in between.  Text formatting is irrelevant as the conversion creates plain text before being interrogated.
  4. Personnel. It is usually best to first extract as much text as needed and then clean using rules defined in the replacement pairs.  Sometimes we need to be quite creative here.
I hope that this helps.
 Excel Business Forums Administrator
 Posted by on
Shocked
Rate this:
(3/5 from 1 vote)
The output data is emptied before each extraction process simply under the assumption that the data has already been used. This can either be turned off by commenting out the VBA code that does this, or data can be copied out to a separate workbook for saving after each process is run. 
 Excel Business Forums Administrator
 Posted by on
 Displaying page 6 of 9 

Excel templates and solutions matched for Importing Data from PDF Files:

Solutions: Export MapPoint Waypoints Survey Data Analysis