Home > Forum Home > Developing Business Administration Solutions > Importing Data from PDF Files | Share |
Forum Topic | Post Reply Login |
Importing Data From Pdf Files | Rate this: (4.2/5 from 24 votes) |
Business Spreadsheets has developed a free Excel program to extract and import PDF data into Excel which can be downloaded and used without restriction. There is a common need to extract and import specific data from PDF files into Excel. Since Excel does not natively support the reading of PDF content, utilities are needed to convert the PDF file content for the Excel format. Several commercial applications accomplish this; however it is often the case where only specific data is required to be imported from multiple PDF files into one structured format. We created such an application by using VBA code in conjunction with an open source PDF to Text conversion utility, which can be found at Foolabs. [Download the free PDF data import Excel program here] The program relies on the conversion utility (included in the download) and all PDF files to reside in the same directory as the Excel application. Text or data to extract are defined in the Control sheet by specifying start text, end text and multiple replacements routines with wildcard support. This enables flexibility to obtain comparable data from multiple PDF files based on patterns independent of different PDF file structures. As many extraction rules as required can be set in order to create a table of information imported by extraction rule and PDF file name. Information on how to set up rules is available within the Excel application with a help icon and cell comments. The VBA code is commented and open for modification. Any improvements or new features to the code are welcome to be posted here so that we can update the download version to the benefit of everyone. | ||
Excel Business Forums Administrator | ||
Posted by Excel Helper on |
Replies - Displaying 11 to 20 of 88 | Order Replies By: Most Recent | Chronological | Highest Rated |
Rate this: (3/5 from 1 vote) Thanks! if i get some new idea ... i will come back - to ask or to propose | |
Posted by zlattko on |
Rate this: (3/5 from 1 vote) One way to reformat this this as column is to use the multiple instance data table as a source for a pivot table and then layout the results in the desired fashion for analysis. Any ideas would be appreciated. | |
Excel Business Forums Administrator | |
Posted by Excel Helper on |
Rate this: (3/5 from 1 vote) i think that it will be much easier to manage with tha multiple data, if they are shown in order how they are found. for Example: I have in PDF: Bank Account, Transaction Amount, Transaction Description, Name/Adres... and now they are in order of key Word Transaction Amount TransactionAmount Transaction Amount .... Bank Account Bank Acount Bank Account .... ect so if they are shown in order how they are found, i can use an reference(in my case -Bank Account) to manage them I think its clear what i mean ?! | |
Posted by zlattko on |
Rate this: (3/5 from 1 vote) The problem with the pivot table, is that if any of the records has a missing field you wouldn't be able to tell which field belongs to a record. Sample Data: Customer Enquire Name: John Doe Phone: 555-5555 Email: [email protected] Question: What time is it? Customer Enquire Name: Jane Doe Phone: 666-6666 Question: Whats 2+2? Customer Enquire Name: Larry Moe Email: [email protected] Question: To be or not to be? --- Note that there could be missing fields, like Phone or Email. In this case we would have this rows: 3 names, 2 phones, 2 email and 3 question it would be hard to tell who has a phone, an email or both. If one could use a special keyword, like 'Customer Enquire', as a record separator maybe the script could deal with this as beeing a different file. | |
Alexandre | |
Posted by oteacher on |
Rate this: (3/5 from 1 vote) We are in the process right now of taking a fresh approach with the code based on these observations. The new code attempts to do away with the multiple instance table and list multiple content found in the combined table as before. The issue is when one or more of the patterns is not found for any set of matches:
| |
Excel Business Forums Administrator | |
Posted by Excel Helper on |
Rate this: (3/5 from 1 vote)
| |
Alexandre | |
Posted by oteacher on |
Rate this: (3/5 from 1 vote) in PDF: Bank Account 1111111 Transaction Amount 369 EUR Descripton 789456123 Name/Adress Paris, 86 AV. ------------------------------------------- Bank Account 22222222 Transaction Amount 258EUR Descripton 147258369 Name/Adress Amsterdam, 86 AV. ------------------------------------------- Bank Account 333333333 Transaction Amount 147 EUR Descripton 321456849 Name/Adress Berlin, 86 AV. --------------------------------------------- Bank Account 444444444 Transaction Amount 789 EUR Descripton 7539514862 Name/Adress ----------------------------------------------- OUTPUT=> eadsheet:
as you see, we always have an reference that is allways different and the code should look something like this: Do while bank account is new -the code what we already have(for multiple data), only if possible modified to show the results as at the table abowe ... after doing this, think you (we also) have an very good product, which will be helpfull to a lot of people.... | ||||||||||||||||||||||||||
Posted by zlattko on |
Rate this: (3/5 from 1 vote) One thing to note id that the beginning text for each replicated pattern should be first in the list of patterns. This has been specified in the cell comment for Start Text. This all worked in the tests but you feedback will again be greatly appreciated. | |
Excel Business Forums Administrator | |
Posted by Excel Helper on |
Rate this: (3/5 from 1 vote) The results are displayed inline in the results. The records are correctly identified. The fields work ok with latin caracters like áéõç... and special caracters like $,x²,(),: (thats a big plus). I've found a bug, if one of the fields has a pipe "|" in the value, the script seems to shift fields. Sample: Field1 abc Field2 123|456 Field3 xyz Result:
The 19/02 version displayed 123|456 as two separated lines in Multiple Instance Data table, and in the "Combined Last Instances" its displayed correctly as "123|456". | |||||||||||||
Alexandre | |||||||||||||
Posted by oteacher on |
Rate this: (3/5 from 1 vote) We changed this separator to three characters ^^^ in the now updated download. This is under the assumptions that this string would be highly unlikely in the content. We look forward to your testing results. | |
Excel Business Forums Administrator | |
Posted by Excel Helper on |
Back | Displaying page 2 of 9 | Next |
Excel templates and solutions matched for Importing Data from PDF Files:Solutions: Export MapPoint Waypoints Survey Data Analysis |