In the digital age, businesses can never run out of ways to streamline their processes and maximize their productivity.
Automation, for example, brings about countless ways to make time-consuming work a walk in the park. Even certain marketing campaigns, such as lead generation and remarketing, can be oiled up with the right automation tools.
Statistics show that the business process automation or BPA market is set to grow to $12.7 billion by 2021 — up from only $8.2 billion this year. This goes to show how businesses are finally realizing that automation is not just a buzzword and it could be the key to unlocking a much higher level of efficiency in their organization.
In this post, however, we will be focusing on a specific type of automation that can help businesses and professionals save a great deal of time from clerical work.
We all know how tedious it can be to capture invoice data for accounting and resource planning purposes.
Not only is manual data entry inefficient, particularly for undermanned organizations, it also puts your invoice data management at risk of human error.
Dissecting Your Solutions
Before we get to the nitty-gritty of automating the process of invoice data capture, you need to learn the tool we’ll use. To truly appreciate our recommended solution, let’s have a quick look at the existing strategies businesses currently use.
As of 2018, there are several ways businesses can extract and manage data from invoices:
Optical Character Recognition
First of all, optical character recognition (OCR) is often used by enterprises who receive mostly paper-based invoices, although there are digital applications of this technique. A typical setup involved the use of a photoelectric device operated with software to scan and identify characters from printed documents.
EDI or XML
Some vendors may send customers invoice data using the electronic data exchange (EDI) or extensible markup language (XML) format. Although these systems are often accurate and secure, they may require a considerable investment network and infrastructure-wise, which isn’t ideal for smaller businesses who don’t place large volumes of orders.
Outsourcing Data Entry Services
Although this option doesn’t exactly qualify as a form of automation, outsourcing an invoice processing service would essentially provide similar benefits. Just like EDI or XML solutions, outsourcing your data entry would only be viable if your company receives a large number of invoices.
Digital PDF Capture
Lastly, processing invoices in PDF form is perhaps the most cost-efficient way to automate data capture, especially when done in the cloud. An email parsing service combines this with OCR-based technology to offer businesses a flexible option when it comes to invoice data capture.
Based on what you’ve just read, can you guess what strategy will be featured in this post?
That’s right — digital data capture with an email parsing service like Parserr.
Apart from the low cost of acquiring the software, using an email parser also doesn’t require any infrastructure upgrades. It also doesn’t force you to rely on a third-party data entry service provider, which is perfect if you are concerned with the possibility of your sensitive data being compromised.
What are Email Parsers?
If you’re not familiar with an email parsing software like Parserr, there’s just one thing you need to know:
Email parsers have the ability to extract certain pieces of information from different email parts, be it the main body, the sender’s address, or even attachments.
Suppose you’re a freelancer who only receives invoices from a handful of contacts.
Typically, your entire process would revolve around you combing through your inbox, downloading attached PDF invoices, and manually extracting pertinent data. That’s hours of work down the drain when you should be pouring your efforts into your core business.
With the workflow we’re about to discuss in this post, you can say goodbye to your hectic practices and put everything on autopilot. Here’s how it’s done:
Step 1: Setting Up Auto-Forwarding
When configuring your email parser, a crucial first step is to set up automatic forwarding on your email account.
Keep in mind that email parsing services don’t invade your inbox with a browser add-on or any other piece of software that should be installed on your computer. Rather, it only works on emails that are forwarded to a separate parsing account.
To configure automatic forwarding account, you can check out the following guides depending on your primary email service provider:
- Turn on automatic forwarding in Outlook on the web
- Automatically forward Gmail messages to another account
- Automatic email forwarding in Yahoo Mail
If you utilize the mail functionality supplied by your web host, the best course of action is to contact their support team or look for existing guides on the subject.
For those who don’t have a dedicated email account for invoices, you’ll need to create filters that automatically pick out the kind of messages you want to be forwarded to your email parsing account.
Let’s say, for the sake of this guide, that you use Gmail for your business emails. To set up your filter, simply select the email you’d like to use as a sample.
Doing so should reveal a new toolbar with a “Settings” button where you can configure a new filter for your message. In Gmail’s case, the button you’re looking for should have the usual triple-dot icon.
From there, click on “Filter messages like these” to proceed to the next step.
On the filter configuration window, the usual route is to specify the senders whom you expect to receive invoices from. There are also other options that will enable you to filter out the right messages.
For example, on Gmail, you have the option to create a filter for messages that contain attachments or a specific keyword:
Upon clicking “Create Filter,” you will be presented with a list of actions that you can trigger whenever an applicable email makes it to your inbox. Since you need to forward this to your Parserr inbox, you’ll have to select the “Forward it” option for your filter or click “add forwarding address” if you haven’t used automatic forwarding before.
The “Add a forwarding address” step is straightforward and should only take less than a minute to complete. What you need to prepare beforehand is your Parserr account, which will provide you with the email address you should use as a forwarding address.
To help you understand this, here is a quick rundown of the Parserr “Quick Setup” process.
Step 2: Configuring Your Parserr Account
You don’t need to spend hours on Parserr to know that ease of use is one of its main advantages.
Upon creating your account, you will be prompted to send your first sample email to your Parserr inbox. The tool will then show you the designated address for your parsing account, which you can then use when creating a forwarding address in your Gmail account.
To move on to the next step, you need to forward a sample email that reflects the kind of message that contains the invoice you normally receive.
With Parserr, remember that you can also extract data from invoices that are included within the email body. But for now, let’s set up a parsing workflow that extracts invoice data from PDF attachments.
After forwarding your sample email, give Parserr a few seconds to detect your message and update the page. The next step will require you to specify the part of the email that contains the invoice.
Since you used a PDF invoice in your sample email, select “Attachments” from the drop-down menu and click “Next.”
On the next page, you will be asked if you use either Zapier or Microsoft Flow — both of which are robust automation tools that will allow you to integrate your Parserr data with hundreds of third-party services and platforms.
For example, a popular way to use Parserr is to automatically extract the email address of senders and funnel them directly to an email marketing platform like MailChimp. To see the full details of how this process works, you can check out this older post on automating data extraction from inquiries.
If you don’t use either Zapier or Microsoft Flow, just go ahead and select “No, none of them” from the drop-down menu. Don’t worry — you can still use any of these integration tools with Parserr through the “Integrations” section.
Parserr should automatically detect any PDF attachment in your sample email. If this is the case, the Quick Setup will ask you to specify the document’s type, be it an HR document, a sales order, a bank statement, and so on.
The default choice is “Invoice/Receipt,” which is exactly what we need to select.
To complete the Quick Setup, you can pre-configure how you want Parserr to process your extracted data. This invoices identifying any third-party application where you intend to send the information — from an email marketing tool to a CRM platform.
We can keep things simple and select “I just want to download it periodically from Parserr.” Just like selecting an integration tool, this is one of the decisions that can be made at a later time.
Step 3: Creating Parsing Rules
After the Quick Setup, Parserr will automatically fire up the parsing rule creation page. This is where you will spend most of your time constructing your automated invoice data capture workflow.
The interface does a great job of streamlining the parsing rule creation experience and minimizing any form of visual clutter. Everything you need to construct even the most complicated parsing workflows is consolidated into this page, such as the initial data preview, rule details editor, and a couple of handy tips for beginners.
At the top of the page, you can choose the email component that contains the information you need to extract. Since we sent the invoice as a PDF attachment in our sample email, click on “Attachments” and wait for Parserr to refresh the “Initial Data” field.
In this guide, we used a free invoice template from Freshbooks — a cloud-based accounting software supported by Parserr as a receiving application.
Here’s how it should appear on the “Initial Data” field once it’s loaded.
If you can’t see a preview of your invoice on the “Initial Data” field, double-check to see if you used the right sample email with the correct attachment. This can be done by navigating to the “Incoming Emails” section, locating your sample email, and clicking the “clip” icon found on the left-hand side.
Once you get your sample email right and have the correct preview of your invoice on the “Initial Data” field, you’re ready to start creating rule steps.
Put simply, parsing rules tell the software which data to extract and which to ignore. There are several parsing rule categories that offer various ways to do this, such as finding specific lines that contain keywords and removing empty lines.
However, if your sample email included an attachment and you’ve been following this guide to the letter, you can only use one rule category: Files.
Under this rule category, there are still a handful of parsing rules you can use to extract the information you need from attachments.
Below is a brief breakdown of these parsing rules along with a few use cases for some of them:
Extract Page Text from PDF
“Extract page text from PDF” is the most straightforward parsing rule that involves email attachments.
It works using OCR or optical character recognition to translate text from a PDF document into raw data. In doing so, you can gain access to the rest of the parsing rule categories Parserr has to offer.
Take note that this is the most feasible parsing rule to use if you intend to extract multiple invoice data from different senders. We will delve deeper into this rule later in this post.
Extract Single Line of Text from PDF
The next parsing rule — “Extract single line of text from PDF” — can be used to directly extract text data from any document.
Theoretically, it should be able to perform what the previous rule is designed for. But instead of scraping off every bit of text from a PDF document, you get to visually select the area that you want to extract — removing everything else outside of your selection box.
This is done through Parserr’s built-in PDF extraction tool.
Although this process is significantly faster and easier than the “Extract page text from PDF” rule, it does have a significant drawback.
If you expect to receive invoices from multiple companies, chances are they don’t use the same template. This presents a problem for the “Extract single line of text from PDF” rule because, once you target an area that must be scanned for text using the PDF Extraction tool, Parserr will always extract data from that spot for all emails.
The only workaround for this issue is to use the PDF Extraction tool to capture data from the entire page — in which case, it would be more efficient and reliable to use the previous parsing rule.
Extract Tables from PDF with Column Markers
This is yet another useful parsing rule for PDF invoices.
When selected, Parserr will launch a different version of the PDF Extraction tool that allows you to mark the columns of a table. Just be sure to position the selection box and highlight the table on your sample email.
Similar to the previous rule, you shouldn’t use this rule if you will receive invoices from different companies that use different invoice templates.
Extract Checkbox Value from PDF Form
“Extract checkbox value from PDF Form” is a unique tool that’s intended for use in survey forms, applications forms, and other documents that may contain checkboxes that senders need to fill.
As far as invoices go, the real-world uses of this parsing rule is quite limited. Still, it could be the only option you have for very specific types of invoices that also reiterate the terms or details of your subscription, purchases, and so on.
This is one of the many parsing rules that take advantage of Parserr’s visual PDF Extraction tool. As always, you simply need to locate and highlight the area of the checkbox where Parserr will determine whether or not the sender set it as true or false.
Extract Radio Button Value from PDF Form
Next up, “Extract radio button (multiple choice) value from PDF Form” functions similar to the previous rule. The only difference is, rather than extracting checkbox values, it is used to detect radio buttons, which are typically used in multiple choice forms.
Get Attachment Name and ID
Lastly, there are two remaining parsing rules you can use for emails with PDF attachments: “Get attachment name” and “Get attachment ID.”
These parsing rules do exactly what they sound like — to obtain information that can be used for the identification of PDF documents. And while this may not seem crucial, it does add an extra layer of organization in your invoice data management.
Step 4: Narrowing Down Your Data
As mentioned above, the rule “Extract page text from PDF” is the most ideal in an automated invoice data capture system because it unlocks the rest of the rule categories in Parserr.
Instead of directly highlighting an area of the document using the PDF Extraction tool, this rule scrapes off every bit of readable information on a PDF document and translates it into pure text data.
For example, using this rule step on our sample email would yield the following extracted content:
In some cases, the initial data may look disorganized because Parserr removes the document’s formatting and page layout. Most of the time, fortunately, the tool neatly arranges all data into separate lines.
This will require you to use the “Find all text after” and “Find all text before” rule steps in tandem to extract the specific piece of data you need. To make things simpler and more accurate, however, you can restore the original layout of the text by selecting “Keep layout” from the drop-down menu to the right of “Extract page text from PDF.”
Using the document’s original layout, we can easily look for the right information from lines that contain specific keywords, such as “tax,” “subtotal,” and “total.”
Suppose we want to extract the total amount you owe from the PDF invoice. When adding a new rule, simply choose “Find rows containing certain text” from the “Find content you need” rule category.
Upon clicking “Save,” Parserr will update the parsing rule configuration page and add a new field labeled “Get row containing text.” All you have to do is enter the keyword “Total” and click “Update.”
This will drastically narrow down the extracted content from your parsing rule.
Take note that multiple instances of this rule step can be used to filter out lines that may not contain the information you need. In the example above, you may notice that Parserr also extracted the row of column headers that contain the words “Invoice Total.”
We can instruct Parserr to weed out this column by adding another rule step that only gets rows with a dollar symbol — removing rows that don’t contain any monetary values whatsoever.
At this point, we can use the “Find all text after” rule step using the “Total” keyword to extract only the actual figures. It’s also advisable to check “Case sensitive” to omit any data from rows that contain terms like “Subtotal” or any other irrelevant references to a “total” amount.
To help you understand this further, here’s what your settings could look like:
The next and final step is to give your parsing rule a name and click “Save Rule” at the bottom of the page.
That’s it — you have successfully extracted raw data from a PDF invoice using Parserr.
Of course, you need to repeat the steps above with different rule steps to obtain additional information from your invoice. This could be the company’s name, an address, taxes, shipping fees, and so on.
Finally, extracting invoice data from the message’s body attribute is an entirely different story — albeit one that would be over in a matter of minutes.
Thanks to intelligent character recognition technology, Parserr can provide you with a set of pre-made rules you can use upon sending your sample email. You can click here for an in-depth post on how this works. If you found this post helpful, don’t forget that Parserr comes with a trial period. Try it for free today.