How to Extract Data from PDF to Excel
Getting useful data trapped inside a PDF into an Excel spreadsheet is a classic headache. Whether it's a financial statement, an inventory list from a vendor, or a downloaded analytics report, you have the numbers you need, but they're locked in a format built for viewing, not analyzing. This guide walks you through several methods, from a quick copy-and-paste to the most powerful (and free) feature already built into Excel.
First, Understand Your PDF: The Two Types
Before you start, it helps to know what kind of PDF you're dealing with, as this determines which method will work best. Broadly, there are two types:
- Digitally-Created (Vector) PDFs: These are the best-case scenario. When you export a document from Word, Excel, or Google Docs to PDF, the text and numbers are stored as actual data. You can select the text, and it's recognized by the computer. Extraction is much cleaner.
- Image-Based (Raster) PDFs: These are essentially pictures of a document, often created by a scanner. You can't select individual words or numbers because the PDF just sees one big image. You'll need Optical Character Recognition (OCR) technology to extract data from these, which we'll cover.
An easy way to check is to try highlighting a sentence with your cursor. If you can highlight the text precisely, it's a digitally-created PDF. If your cursor draws a big blue box over the text, it's likely an image.
Method 1: The Quick-and-Dirty Copy and Paste
This is the first thing everyone tries, and sometimes, for simple tables, it actually works. It's best for quick jobs where formatting isn't a huge concern, but be prepared for a bit of cleanup.
Step-by-Step Instructions:
- Open your PDF file and find the table or data you want to extract.
- Using your mouse, click and drag to select all the data in the table.
- Copy the data (right-click and select "Copy" or use the keyboard shortcut Ctrl+C or Cmd+C on a Mac).
- Open a new Excel spreadsheet.
- Click on cell A1 and paste the data (right-click and use "Paste" or press Ctrl+V or Cmd+V).
Best For:
- Very simple, clean tables in a digitally-created PDF.
- When you just need the raw numbers fast and can clean them up manually.
Common Problems:
This method often fails spectacularly. You're likely to encounter one of these issues:
- All Data in One Column: Your beautifully structured table from the PDF gets pasted into a single, long column in Excel.
- Messy Formatting: You might get odd spacing, random line breaks, and other formatting artifacts that make the data unusable without extensive tidying.
- Numbers Pasted as Text: Excel might not recognize your numbers as numbers, which means you can't run any calculations on them.
If you face these issues, don't waste hours cleaning it up. Move on to a better method.
Method 2: Let Microsoft Word Do the Conversion
This might sound strange, but Microsoft Word has a surprisingly good PDF conversion engine built right in. It can often interpret the structure of a PDF document better than a direct copy-paste. You can use Word as an intermediary to get the data into a table format that Excel understands.
Step-by-Step Instructions:
- Open Microsoft Word.
- Go to File > Open and select your PDF file.
- Word will show a pop-up message saying it will convert your PDF into an editable Word document. Click "OK". This process can take a moment for large or complex files.
- Once the document opens, Word will have done its best to convert the PDF content, often turning PDF tables into editable Word tables.
- Find the table you need, select it, and copy it.
- Open Excel and paste the table. The formatting from Word tables usually translates much more cleanly into separate cells in Excel.
Best For:
- Well-structured PDFs where direct copy-pasting fails.
- Image-based PDFs or scans. Word's converter has basic OCR capabilities, so it can sometimes recognize text in scanned document tables.
Method 3: The Most Powerful Method You Already Have - Excel's Power Query
For anyone who does this regularly, this is the definitive answer. Power Query (also called "Get & Transform Data") is a free, incredibly powerful data processing tool built directly into modern versions of Excel (Excel 2016 and later for Windows, and Excel for Microsoft 365). It can connect to a PDF directly, identify tables, and let you clean and shape the data before it ever hits your spreadsheet.
Step-by-Step Instructions:
- Get Your Data: Open Excel and navigate to the "Data" tab on the ribbon. In the "Get & Transform Data" section, click "Get Data" > "From File" > "From PDF".
- Select Your File: A file browser will open. Locate and select the PDF file you want to extract data from and click "Import".
- The Navigator Window: Excel will analyze the PDF and open a 'Navigator' window. This is the heart of Power Query's import process. On the left side, you'll see a list of all the tables and pages Power Query was able to identify within the PDF. You can click on each one to see a preview on the right. Usually, the icons with a blue bar at the top are properly formatted tables, while the others represent entire pages.
- Load or Transform: Select the table that contains the data you need. You now have two options:
- Use Your Data: Once loaded, your data is in a native Excel table. Even better, your query is live. If the source PDF is updated, you can just go to the Data tab and click "Refresh All", and Excel will automatically pull in the new data without you repeating the steps.
Best For:
- Almost every situation. It's perfect for complex PDFs with multiple tables.
- Recurring reports where you need to get data from a similarly-formatted PDF every week or month. You set it up once, and then just refresh.
- Data that needs cleaning and shaping before being used in your final analysis.
Method 4: Using Online Converter Tools
If you have a one-off task and don't want to mess with Excel features, a handful of online tools can convert a PDF to an Excel file with a single click. Websites like Smallpdf, I Love PDF, and others will do the conversion for you. You upload your PDF, they process it, and you download the resulting .XLSX file.
Best For:
- Very quick, simple, one-time jobs.
- Situations where you don't have access to modern versions of Excel or Word.
An Important Word of Caution:
Do not upload sensitive data to these free online tools. Financial records, employee information, customer lists, and proprietary company data should never be uploaded to a third-party website with unclear security and privacy policies. For sensitive information, always stick to offline methods like Power Query or Adobe Acrobat.
For Scanned (Image-Based) PDFs: Adobe Acrobat Pro
If your PDF is a scan, none of the above methods will work well without Optical Character Recognition (OCR), which is technology that "reads" the text from an image. The industry standard tool for this is Adobe Acrobat Pro (the paid version, not the free Reader).
When you open a scanned document in Acrobat Pro, it will often automatically detect that it's an image and prompt you to run character recognition. Otherwise, you can use the "Scan & OCR" tool. Once text recognition is complete, a new layer of selectable text is placed over the image. Afterwards, you can use Acrobat's 'Export' feature to save the data to an Excel workbook, or use the copy/paste and Power Query methods mentioned earlier, which should now work.
Final Thoughts
Getting your data out of a PDF doesn't have to be a manual nightmare. For quick and ugly jobs, copy-paste might do the trick. For anything more serious, Excel's Power Query is a game-changer, transforming a repetitive, error-prone task into an automated, refreshable process. By choosing the right method, you can stop fighting with PDFs and start analyzing your data.
Hours spent extracting data from static files like PDFs mirrors the larger pain of manual reporting. While Power Query solves the PDF issue, businesses still manually export CSVs from platforms like Google Analytics, Shopify, and Facebook Ads, trying to stitch everything together in a spreadsheet. At Graphed, we built our tool to eliminate that busywork entirely. We let you connect your live data sources in seconds so you can use simple, natural language to build dashboards and get insights instantly - no more wrestling with files, just answers in real-time.
Related Articles
How to Connect Facebook to Google Data Studio: The Complete Guide for 2026
Connecting Facebook Ads to Google Data Studio (now called Looker Studio) has become essential for digital marketers who want to create comprehensive, visually appealing reports that go beyond the basic analytics provided by Facebook's native Ads Manager. If you're struggling with fragmented reporting across multiple platforms or spending too much time manually exporting data, this guide will show you exactly how to streamline your Facebook advertising analytics.
Appsflyer vs Mixpanel: Complete 2026 Comparison Guide
The difference between AppsFlyer and Mixpanel isn't just about features—it's about understanding two fundamentally different approaches to data that can make or break your growth strategy. One tracks how users find you, the other reveals what they do once they arrive. Most companies need insights from both worlds, but knowing where to start can save you months of implementation headaches and thousands in wasted budget.
DashThis vs AgencyAnalytics: The Ultimate Comparison Guide for Marketing Agencies
When it comes to choosing the right marketing reporting platform, agencies often find themselves torn between two industry leaders: DashThis and AgencyAnalytics. Both platforms promise to streamline reporting, save time, and impress clients with stunning visualizations. But which one truly delivers on these promises?