Excel Help for Importing Data From Pdf Files

Excel Help for Importing Data From Pdf Files in Developing Business Administration Solutions

Importing Data From Pdf Files

Rate this:

(4.2/5 from 24 votes)

	Business Spreadsheets has developed a free Excel program to extract and import PDF data into Excel which can be downloaded and used without restriction. There is a common need to extract and import specific data from PDF files into Excel. Since Excel does not natively support the reading of PDF content, utilities are needed to convert the PDF file content for the Excel format. Several commercial applications accomplish this; however it is often the case where only specific data is required to be imported from multiple PDF files into one structured format. We created such an application by using VBA code in conjunction with an open source PDF to Text conversion utility, which can be found at Foolabs. [Download the free PDF data import Excel program here] The program relies on the conversion utility (included in the download) and all PDF files to reside in the same directory as the Excel application. Text or data to extract are defined in the Control sheet by specifying start text, end text and multiple replacements routines with wildcard support. This enables flexibility to obtain comparable data from multiple PDF files based on patterns independent of different PDF file structures. As many extraction rules as required can be set in order to create a table of information imported by extraction rule and PDF file name. Information on how to set up rules is available within the Excel application with a help icon and cell comments. The VBA code is commented and open for modification. Any improvements or new features to the code are welcome to be posted here so that we can update the download version to the benefit of everyone.
	Excel Business Forums Administrator
	Posted by Excel Helper on 17 Jan 2011

Replies - Displaying 1 to 10 of 88

Order Replies By: Most Recent | Chronological | Highest Rated

	Nice Tool.. Little Help Rate this: (4.5/5 from 4 votes) Excellent Tool and very Handy, But I have one small issue here , How is it handled when we have source/destination text appears many times. Ex: Date 07/20/2008 Depth :00 m When I try to grab the Date text, B/W Date and Depth it gives me wrong data as I have Depth field some where else also on the same doc. Any inputs will be appreciated
	Posted by sunildm4u on 02 Feb 2011

Appending Results

Rate this:

(4.3/5 from 3 votes)

To be clear, for appending results, we need to remove the clearing of the content and set the start row for output to the next available row below existing content.

Precisely, we replace this code:
VBA Code:

Call clearoutput
mrow = Range("outstart").Row + 1

With this code:
VBA Code:

mrow = Range("B50000").End(xlUp).Row + 1

Excel Business Forums Administrator

Posted by Excel Helper on 06 Aug 2013

	Multiple text patterns Rate this: (4.3/5 from 3 votes) In response to this latest post: If there are multiple start texts with similar pattern, we can just specify multiple rows for extraction rules each with different start text. To limit the extraction we need to accommodate all cases in the end text (perhaps including a space) and multiple replace pairs with wild cards * to remove any unwanted content.
	Excel Business Forums Administrator
	Posted by Excel Helper on 01 Jun 2014

	Unique text only Rate this: (4/5 from 4 votes) At this stage, the program can find the last instance of the text match in the PDF files. It could be modified to save all matches to memory and then output them by modifying the VBA code. To extract exact instances of the text you need to be explicit with the start and end text by providing as much unique text as possible so that the program does not confuse it with other text in the PDF file. In this case, there may be other text after or before "Depth" that can be specified in order to extract the correct instance.
	Excel Business Forums Administrator
	Posted by Excel Helper on 02 Feb 2011

	New Version for Multiple Instances of Data Rate this: (4/5 from 2 votes) It seems that a frequently required solution is to extract multiple instances of the same text pattern from one or more PDF files. The logic behind this is slightly different than the original setup which attempts to line up multiple data for the same pattern from multiple PDF files. The new version, which can be downloaded from the original link above, adds a second table of results in the Output sheet which lists multiple instances of data matched within and across PDF files processed. This approach retains the benefit of the original consolidation approach while adding the support for multiple data instances within each file. Another feature added is the ability to retain the text files generated for each PDF file to assists with pattern matching setup and for alternative use of the content. The new version has cell comments with detailed information on the logic. Please post your feedback here so that we can continue to improve this free and open source solution for importing PDF data into Excel.
	Excel Business Forums Administrator
	Posted by Excel Helper on 19 Feb 2012

	Separator characters Rate this: (4/5 from 2 votes) OK. The separation chaacter string used in the code to build the arrays of matching text is the '^^^' on the assumption that this is unlikely to be found within the content itself. The pipe character '\|' is still used to specify replacement pairs as this is independent of the source text extraction and used only to define find and replace patterns in order to clean the output as necessary. I hope that this clarifies and if you have any ideas for improvement then we can look to implement.
	Excel Business Forums Administrator
	Posted by Excel Helper on 22 Feb 2013

	Tabulated data in PDF files Rate this: (4/5 from 2 votes) If data within the PDF files is tabulated we need to extract the data by rows and then use the text to columns function in Excel in order to replicate the table structure. To extract rows we need to specify the start and end text. End text is simple as it will be a new line by using the [new line] option. The start text is more complicated as the extraction will return whatever is found after the starting text. We can specify the first column text and replace it afterward is there is a common pattern. Alternatively if a common pattern exists the the last column of the prevoius row or text then we can use that with [new line] to extract the entire subsequent row.
	Excel Business Forums Administrator
	Posted by Excel Helper on 08 Jul 2013

	Yes Sir Spaces do matter Rate this: (4/5 from 2 votes) Thanks for the tip, I actually went back and counted all the spaces between the % and the following number 10.0. I found 7 spaces so I inserted them in the 'Replacement Pairs' and that seems to fix the issue. Thanks again.
	Posted by bunjaby on 09 May 2014

	Column Controls? Rate this: (4/5 from 2 votes) Hi, I love the file and it works for the pdf's that you gave as an example. I am trying to use it for pdf's of bank statements. I have another file with macros that reconciles bank activity to GL ledger activity and marks with an x anything that reconciles leaving your reconciling items. The issue I have is we only get bank statement activity in PDF format. Your file and code would work perfectly if I could use the control text tab and place the column headings as text to search for in the pdf and it drops in all the text details on the pdf below that column heading? I am not sure if it is possible since it is searching for text and this is on the lines below the text. Any suggestions?
	Anthony
	Posted by amartin575 on 05 Dec 2012

	Multiple Entries Rate this: (4/5 from 2 votes) Hi! Congrats on this great tool! I'd like to know how to modify the code so one could get multiple data from a single PDF. Example: ---- New Contact ---- [Start Text:Name] data collected [End Text] [Start Text:Phone] data collected [End Text] ---- New Contact ---- [Start Text:Name] data collected [End Text] [Start Text:Email] data collected [End Text] [Start Text:Address] data collected [End Text] Note that each client may have different fields, but the current control page its already taking care of various fields. It would be necessary to create a check for a separator ("New Contact" in this case) Is this possible or it would be required to develop a new code? Thanks from Brasil
	Alexandre
	Posted by oteacher on 07 Jan 2012

Displaying page 1 of 9

Excel templates and solutions matched for Importing Data from PDF Files:

Solutions: Export MapPoint Waypoints Survey Data Analysis

Importing Data from PDF Files

Excel Help for Importing Data From Pdf Files in Developing Business Administration Solutions

Importing Data From Pdf Files

Excel templates and solutions matched for Importing Data from PDF Files: