How to Extract Data from PDF (and Other Attachments) to Excel

Let’s face it, an email inbox isn’t the most organized place in the world.

If you’re a solopreneur, blogger, or small business owner, you probably have a consolidated mailbox for everything business-related, which means you’ll be dealing with a torrent of receipts, direct inquiries, and other notifications from the services you use.

You may also get a couple of personal messages and social media notifications here and there.

Fortunately, there is a way to easily cut through the clutter and get the data you need. With Parserr, you can automatically extract useful information from emails and export them in a structured format.

You can also use Parserr to obtain data from attachments, such as invoices, bills, and financial documents.

Sounds neat, right? We’ll show you how to do this in a jiffy.

But first, a little introduction.

What is Parserr?

Parserr is an email parser software that lets users specify “rules” for extracting data.

In the initial “Quick Setup,” you will be shown the email address of your parsing account. This is where emails must be forwarded in order for the parsing rules to apply.


For more control over your parsing workflow, you may manually bulk forward the emails regularly. If you want, you can also enable automatic forwarding in your email service provider.

To enable automatic forwarding in your email account, here is a list of guides you need to follow:


Yahoo Mail


Zoho Mail

Getting Started with Parserr

Now that you have an idea how Parserr works, it’s time to set up the rules for extracting data from PDF attachments.

If you haven’t already, do check out this post on how to set up your Parserr account for the first time.

Basically, you need to send a “test email” that matches the type of email you want to process. This will allow you to develop a set of parsing rules that can be used for similar emails in the future.

Since we want to create parsing rules for PDF data, it’s imperative that we use a sample email with an actual attachment. But what if you already completed the Quick Setup before?

Don’t worry, you should still be able to designate a new sample email from the Parserr dashboard. Simply send a new email to your parsing account address, look for it in the “Incoming Emails” section, and click the “Make this sample email?” link.


Creating Your Parsing Rules

Once your sample email is ready, you can start creating new parsing rules by navigating to the “Rules” page. Here, you may also edit, copy, and delete any active rules you may have previously created.


After clicking “Add Rule,” you will then be asked to select the specific email attribute where data can be obtained.

Go ahead and select “Attachments” to tell Parserr that it should look for data from PDF documents.


How do you know if Parserr detects the attachment from your sample email? You just scroll down and observe the “Initial data” field.

A full preview of the attached PDF document should be visible here.


Now that the initial PDF data is retrieved, you need to add another “Rule Step” that tells Parserr exactly what to extract. For this, simply click the green “plus” sign below the initial data field.


In the “Rule Category” drop-down menu, select “Files” to view the rule steps applicable to PDF documents.


At this point, you will be presented with two options:

1. Extract a Single Line of Text

If you only need a specific line of information from the PDF, then the “Extract single line of text from PDF” rule is definitely the more convenient choice.


This launches the PDF Data Extraction tool, which lets you manually crop the area of the document that you want to obtain. Simply use your mouse to move and resize the selection box and click “Confirm Selection.”



After adding your rule step, inspect the “Content extracted from PDF” field to make sure you obtained the data you want. If everything’s in order, give your parsing rule a name before you save it.


2. Extracting Entire Pages

The PDF Data Extraction tool is handy if you want a parsing workflow up and running in no time. However, it could present issues if you’re receiving documents from multiple senders that use different formats.

That’s why it’s sometimes better to extract the text from an entire PDF before refining it with additional rule steps.


With the “Extract page text from PDF” rule, every detail in a PDF document is initially captured.


You’ll then be able to refine the data with the traditional rule categories.

For example, if you want to extract the statement number, add a new rule step and select “Find content you need.”


In the “Get row containing text” field, enter the keyword you want Parserr to look for and click “Update.” The refined data should now contain only the string of text that contains it.


To make sure only the data itself will be extracted, add a “Search & Replace” rule category and enter the keyword into the “find text” field. Leave the “replace with” field empty to replace your keyword with a blank space.


Finally, give your rule a name and click “Save Rule” to activate it in your parsing workflow.

There you have it — a parsing rule that automatically scrapes data from PDF attachments. The next step is to export this data into an offline Excel document.

Exporting Parsed Data

To export Parserr data into an Excel document, the first method is to open the “Export” page and generate a downloadable file.

Remember to select the right format and date before you click “Download Now.” For security purposes, the report will be sent to your private email rather than downloaded directly on your browser.


Another way to export your Parserr data to Excel is through a Zapier integration, which can be configured on the “Integrations” page.


This will bring up the Zap configuration window where you can complete the steps necessary to use the integration. This includes selecting your Parserr inbox and connecting your Microsoft account.


After configuring the Zapier integration, here’s what your Parserr data should look like in your Excel document:


Why choose the second method over the first? With a Zapier integration, your Parserr data will be sent to Excel Online, which is a cloud-based service that can be accessed with your Microsoft Account.

Instead of generating an offline document every time you export your data, using an integration only adds a new row to an existing Excel document. This means you only need to keep one copy of your Parserr spreadsheet at a time.


Manual data entry can be stressful, especially the part where you have to comb through your emails for pertinent business data.

With Parserr, data from PDF attachments can be sent straight to your spreadsheet or a third-party service. Apart from its time-saving benefit, automating this workflow also eliminates the risk of human error.

Like what you read? You might be interested in this post where we discuss how to use Parserr to extract invoice data from emails.