In our last newsletter, we discussed the importance of “knowing your numbers” — the foundation of effective cost management. But for many organizations, turning that idea into action is a real challenge. One of the biggest obstacles? Simply getting your data into a format that’s usable — something you can analyze and turn into useful, real insights.
In my experience, the most valuable data often comes from invoices. They contain detailed descriptions of purchases, item numbers, and transaction-level insights that are gold for cost analysis. The catch? Most of this data is locked away in PDF files.
For many of my clients, this means I’m regularly collecting and extracting data from stacks of PDF invoices, mostly on a quarterly basis.
For new project where we are trying to get an initial view of the spend, it often leads to thousands of pdf invoices to be converted to a consolidated digital format.
It’s not ideal, but it’s where the most actionable insights tend to live. Purchase Order (PO) data can help, too — but it’s often patchy or incomplete, and usually requires a fair bit of cleanup before it's useful.
Where to Begin? Start with Your Suppliers
If you’re not sure where to start, begin with your largest suppliers. Many can provide a consolidated download of your transactions in Excel or CSV format — and that’s a real win. With smaller suppliers you may not have that option, but it’s always worth asking. You’d be surprised what they can offer when prompted.
That said, in about 90% of the projects I work on, in the end we’ve still had to go back to the PDF invoice data. It’s more work upfront, but well worth the effort in the long run.
A Game-Changing Tool: Docparser
A few years ago, I faced the same challenge: piles of PDF invoices and no easy way to extract the data. I’ve tried outsourcing the task offshore, but I tend to end up spending more time fixing the results than it saved me.
That’s when a colleague suggested I give Docparser a go, a tool that extracts structured data from PDFs (and other formats). It’s not perfect by any means, but it’s powerful — especially with electronic PDFs and consistent invoice layouts. Best of all, their support team is really responsive and helpful and in my case I usually get a response overnight.
Once the rules have been set up for a particular client, it’s just a case of uploading the documents and the extraction is complete in 5 minutes. Once set up 300–400 invoices in just a few minutes. For me it’s a massive boost in productivity — and a real game changer for anyone dealing with large volumes of documents they need to get specific data from.
Need a Hand? Let’s Talk
If your data is stuck in PDFs or you’re not sure where to begin, you’re not alone — and it’s not the end of the road. There are tools and strategies that can help. Whether you want to tackle it yourself or get some support, I’m always happy to have a chat or roll up my sleeves and help you get started.

Grant Morrow
Principal Consultant
+61 415 203 575




























































































