How to Extract Data from PDF Forms Using Parserr

If you work in a human resources, purchasing, accounting or administrative role, you’re probably very familiar with compiling information from at least one the following documents:

• Invoices & receipts
• Purchase & sales orders
• Shipping & delivery orders
• Bank statements
• Weekly & monthly reports
• Employee forms

Extracting data from these PDF forms and scanned documents into a usable format is probably one of the basic requirements of your job, and one of the most frustrating.

But in today’s world of automation technology and instant gratification, you shouldn’t have to do that anymore. And you don’t have to!

Not only can you automate this process for a batch of documents, you can set it up so that you’ll never have to do it for similar documents in the future. With Parserr, you simply send an email, and the rest of the work is done for you.

Ready to learn how? Let’s get started!
 

Step 1: Sign up for Parserr

Parserr’s goal is to help you automate your business and eliminate unnecessary manual data entry through email and document parsing. To get started, simply head to our homepage, enter your email and we’ll walk you through the rest.
 

Step 2: Send an email with your sample PDF attached

Once you’ve registered, you’ll be taken to a screen similar to the one below, where you’ll be giving a randomly-generated email address unique to your account.
 

 
Parserr is primarily email parsing software, therefore uploading documents is done via email. Use your preferred email client to send an email, with your sample document attached, to your account. Once it’s sent, Parserr will detect it automatically and move you to the next stage.

The document you attach now will only be used as a sample. You’ll need to re-upload it at the end of this process, if you’d like to parse it as well as others.
 

Step 3: Tell Parserr what you plan to do

Parserr allows you to export the data it parses in many ways. Along with a standard Excel or CSV format, you can also connect Parserr with Zapier to give you an even wider range of options for export, such as Google Sheets, MailChimp, Google Calendar and Slack.

To facilitate a smoother process, Parserr will ask what part of the email you would like to extract from and what third party application you’ll be extracting to. Simply choose your reply from the dropdown menu options provided and click “Next” to move on.

One of the most common ways users compile parsed data is via a spreadsheet, so we’re showing you how to compile the data from your parsed PDFs into a Google Sheets spreadsheet.
 

Step 4: Add your first rule

Next, you’ll be taken to the Rules section, where you’ll be setting your first rule. First, select the Attachment attribute. This tells Parserr that you’ll be parsing data from your email attachment, not the body of the email itself.
 

 
Scroll down and click the plus sign to add your rule. You’ll see a pop-up window, asking you to select the type of Rule you wish to set. Choose Files as the rule category, and select “Extract single line of text from PDF”, as shown below. The other option for PDF files is to extract all the text from a single page or all pages of your document.
 

 
Next, you’ll be taken to a page showing your document. We’re using an invoice for this example.

Draw a rectangle over the data you wish to capture. Notice in this example, it’s highlighted right after “Invoice #” because that’s where the Invoice number will appear on every invoice from this company.

Then we give the selection a name below, and click “Confirm selection”.
 

 
You’ll then be taken to the previous page, which will be updated with the parsed data and the Rule name. Verify its accuracy and click “Save Rule”.
 

 
Your saved rule will now look like the below image on the Rules tab. From here, you can edit the rule, duplicate it or delete it.
 
We could move forward to the next stage of parsing with this, but that’s if we only required one piece of data from each PDF document. Chances are, you’ll have other data that you wish to capture as well.
 

 
To add more Rules for capturing data, simply click the “Add Rule” button and repeat this step as many times as needed, until you’ve made a Rule for each line or page of data you wish to capture.

Below, we have created three rules for the three items we want to capture from our invoices – invoice number, the total amount due and the due date.
 

 
Now, we’re just about ready for parsing, but we need to do one last thing…
 

Step 5: Set up your third party application

Remember, in Step 3, where we chose to export our parsed data to Google Sheets? Well it’s time to set up that spreadsheet before we proceed. (Note: What you choose at the beginning of this process will determine the instructions given at this stage and whether you’ll need to create a new file. Some will require this step, others will not.)

In a separate tab (keeping Parserr open), go to the third party application and create a new file. In the case of Google Sheets, you can add headings that will correspond with the Rule Names you’ve listed in Parserr. Here’s our sample sheet prior to parsing.
 

 
Now, let’s get back to Parserr to complete the third party integration and parse our data.
 

Step 6: Integrate your third party application account

At the top of the Rules screen (where we left off), you will see a prompt to send your data to the integration tool you’ve chosen. Click the link, then log into or set up your account and connect it to Parserr. In some cases, Parserr will pull in the most recent files on that account and ask which you’d like to use to store your data.

As shown below with Google Sheets, Parserr allows me to select the spreadsheet and worksheet. Additionally, it allows me to map the Parserr Rule Name to the Google Sheets column headings that I just created.

Once this is complete, you can click “Save changes” and we’ll move on to the final step, parsing your data and exporting it directly to your third party tool!
 

 

Step 7: Parse your data!

It’s time! The final thing you need to do is to send an email to your Parserr email that you used earlier. (If you ever forget it, it’s always at the top right of your screen in Parserr.)

Parserr will then process your email automatically and send the data straight to your third party application. Here’s the same Google Sheet from above, that has been automatically updated with an email we sent to Parserr.
 

 
And there you have it!

For each additional PDF file you have, simply email it and Parserr will take care of the rest.
 

Bonus: Automate the whole process

If you really want to automate this process entirely, you can set up auto-forwarding in your respective email accounts, that will forward any email you specify to your Parserr inbox, without you ever getting involved. You literally won’t have to lift a finger!

Here are the auto-forwarding instructions for GMail, Yahoo and Outlook.
 

Now it’s your turn!

Ready to give Parserr a test run? Set up your free account right now and use this guide to help you get started with extracting data from your PDFs.

Or if you want to eliminate all that, simply contact us once you’ve registered your account and we’ll set things up for you. You can get back to business and leave the boring and mundane PDF parsing to us. After all, we’re really good at it.