Home > Forum Home > Developing Business Administration Solutions > Importing Data from PDF Files Share

Importing Data from PDF Files

Excel Help for Importing Data From Pdf Files in Developing Business Administration Solutions


Forum TopicPost Reply Login

Importing Data From Pdf Files

Rate this:
(4.2/5 from 24 votes)
Happy Business Spreadsheets has developed a free Excel program to extract and import PDF data into Excel which can be downloaded and used without restriction.

There is a common need to extract and import specific data from PDF files into Excel. Since Excel does not natively support the reading of PDF content, utilities are needed to convert the PDF file content for the Excel format. Several commercial applications accomplish this; however it is often the case where only specific data is required to be imported from multiple PDF files into one structured format.

We created such an application by using VBA code in conjunction with an open source PDF to Text conversion utility, which can be found at Foolabs.

[Download the free PDF data import Excel program here]

The program relies on the conversion utility (included in the download) and all PDF files to reside in the same directory as the Excel application. Text or data to extract are defined in the Control sheet by specifying start text, end text and multiple replacements routines with wildcard support. This enables flexibility to obtain comparable data from multiple PDF files based on patterns independent of different PDF file structures.

As many extraction rules as required can be set in order to create a table of information imported by extraction rule and PDF file name. Information on how to set up rules is available within the Excel application with a help icon and cell comments. The VBA code is commented and open for modification.

Any improvements or new features to the code are welcome to be posted here so that we can update the download version to the benefit of everyone.
 Excel Business Forums Administrator
 Posted by on
 
Replies - Displaying 71 to 80 of 88Order Replies By: Most Recent | Chronological | Highest Rated
Oops
Rate this:
(3/5 from 1 vote)
From the PDF on Google drive, it seems that you need to have "PERSONAL INFORMATION" as the start text and "STRENGTHS AND ABILITIES" as the end text which would get everything in between. You can also then replace these strings in the replacement pairs.
 Excel Business Forums Administrator
 Posted by on
Grateful
Rate this:
(3/5 from 1 vote)
Dear Admin,

I did follow your suggestion but it's still empty. Now I am sharing with you my rule definition:

Start Text End Text Replacement Pairs
PERSONAL INFORMATION  STRENGTHS AND ABILITIES  PERSONAL INFORMATION,|STRENGTHS AND ABILITIES,
 
I also share the excel file with you, so you can review:

https://drive.google.com/file/d/0B822d4OBtHm5ckN2Z3FzUnNNX2s/edit?usp=sharing

For your advice - thanks a lot.

neo. 
 Posted by on
Oops
Rate this:
(3/5 from 1 vote)
The error is due to the file extension in upper case. The file name is test4.PDF. It works if we save the files as test4.pdf or change the VBA code to lower case the file extension:
VBA Code:
'get collection of pdf files
While strFile <> ""   
If LCase(Right(strFile, 4)) = ".pdf" Then
        colFiles.Add strFile   
End If  
strFile = Dir
Wend 
 Excel Business Forums Administrator
 Posted by on
Sad Dear admin,

I already changed the file name to lowercase but I am sorry...the result is still empty.

For your advice.

Thank you.

neo. 
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Hi,
I would need tow help here.
1) For me replacement text is not working fine.
For Eg: My start text is :Receiving Office Engagement Number" which is simlier in most of the PDF .but few PDF have as "Receiving office codeblock" or "Engagement Number" SO it is not piking for those PDFs. How do i put the replacements ?

2)My PDF has Number is multiples palce.
Eg: Number ;ABC123 is in several place in my PDF si the out put is giving 3 to 4 lines.So i would need it to pick only once. How do i change the code ?

Regards,
Vijeth
 Posted by on
Oops
Rate this:
(4.3/5 from 3 votes)
In response to this latest post:
  1. If there are multiple start texts with similar pattern, we can just specify multiple rows for extraction rules each with different start text.
  2. To limit the extraction we need to accommodate all cases in the end text (perhaps including a space) and multiple replace pairs with wild cards * to remove any unwanted content. 
 Excel Business Forums Administrator
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Its Reflecting like this ,
A B C
1
IE0100110681 Invoice Number IE0100110681
IE0100110681 Invoice Number IE0100110681 Total Due 7,250.00
IE0100120921 Invoice Number: IE0100120921
IE0100120921 Invoice Number: IE0100120921
2
3 So Now what i need to do ? I dont need it to Come in two Line... Could you please aso help me whcih a eg if Input ?
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
In this case it looks like "Total Due" should be the start text and the end text a new line. Otherwise, if you want the invoice number then use that as a pattern and then multiple wild card replacements to remove everything between that and the amount.
 Excel Business Forums Administrator
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
thats a real nice program! congrats i want to make my own website (something like http://exceltopdf.org/ ) maybe you can give me some tips on how this works :D
 Posted by on
Confused
Rate this:
(3/5 from 1 vote)
Is it possible to amment the script to allow multiple columns to be extracted from the pdf. e.g. invoice lines?
 Posted by on
 Displaying page 8 of 9 

Excel templates and solutions matched for Importing Data from PDF Files:

Solutions: Export MapPoint Waypoints Survey Data Analysis