PDF to Excel conversion gets your data out of the PDF — but the result is often raw extracted data that needs some work before it is truly useful. Depending on the source PDF, you might have numbers stored as text, dates in inconsistent formats, duplicate header rows, extra whitespace in cells, or columns that need splitting or merging.
This guide covers the most practical Excel techniques for cleaning, formatting, and enriching data after PDF conversion — turning raw extracted tables into structured datasets you can actually analyze.
PDFs frequently embed extra leading, trailing, or internal spaces in text values — especially in description and name columns. These invisible characters cause problems with lookups (VLOOKUP will fail if the lookup value has a trailing space), sorting, and de-duplication.
The TRIM function removes all leading and trailing spaces, and reduces any multiple internal spaces to a single space:
=TRIM(A2)
Apply TRIM to all text columns in a helper column, then paste the values back over the original column: copy the TRIM column, select the original column, use Paste Special → Values to replace the original data with the cleaned values.
Dates imported from PDFs often appear as plain text — "15 Jan 2026" or "01/15/2026" — rather than as Excel date values. This prevents date-based calculations, chronological sorting, and PivotTable grouping by month or quarter.
The DATEVALUE function converts a text date into an Excel date serial number:
=DATEVALUE(A2)
After converting, format the resulting column as a date (right-click → Format Cells → Date) to display in your preferred format. If DATEVALUE fails due to an unexpected date format, use the Text to Columns wizard (Data tab) with the Date format option to parse the column.
PDFs that repeat column headers on every page (common in bank statements and financial reports) will have those headers appearing as data rows after conversion. Here is the fastest way to remove them all at once:
Currency values with symbols (€, $, £), thousands separators, or parentheses for negatives are stored as text by Excel. To convert them to proper numbers:
After conversion, apply the appropriate number format (right-click → Format Cells → Number or Currency).
If the conversion merged two or more columns together — for example, a full name where first and last name should be separate, or a date and time combined in one cell — use Text to Columns to split them:
Once your data is clean, you can enrich it immediately. A common next step after extracting bank statements or expense reports is categorizing each row. Create a lookup table on a second sheet listing keywords and their categories:
| Keyword | Category |
|---|---|
| ALBERT HEIJN | Groceries |
| KPN | Utilities |
| Software |
Then use XLOOKUP with a wildcard search to match keywords within descriptions:
=XLOOKUP("*"&Lookup!A2&"*", A:A, "Groceries", "Other", 2)
A more practical approach is to nest multiple XLOOKUP calls or use a helper column with SEARCH to identify which keyword appears in each description, then return the matching category.
Once your data is clean and categorized, a PivotTable gives you a comprehensive summary in seconds:
The result is an instant spending summary, income breakdown, or transaction report — built from data that started as a locked PDF table just minutes ago.
Before using converted financial data for any important purpose, always verify that the totals match. Add a SUM formula at the bottom of each amount column and compare it against the total shown in the original PDF. If the numbers match, the extraction was complete and accurate. If they do not match, there are missing rows, duplicate rows, or values stored as text that are not being summed.
This simple validation step takes less than a minute and gives you confidence that the data is complete.
PDF extraction sometimes introduces invisible characters that are not visible on screen but cause formulas to fail or VLOOKUP matches to break. The two most useful functions for this are:
=TRIM(A2) to clean up text values.=CLEAN(A2) or combine with TRIM: =TRIM(CLEAN(A2)).If you are still getting VLOOKUP mismatches after TRIM and CLEAN, the culprit is often a non-breaking space (character code 160). Remove it with SUBSTITUTE: =SUBSTITUTE(TRIM(CLEAN(A2)), CHAR(160), " ").
Once you have clean values in a helper column, copy and paste as values over the original column, then delete the helper column.
Converting your data range to an Excel Table (Ctrl+T) is one of the best habits to develop. Tables provide several advantages over plain ranges:
=SUM(D2:D500), you can write =SUM(Table1[Amount]) — much more readable and less error-prone.Name your table something descriptive (like "Transactions_2025" or "InvoiceData") in the Table Design tab. This makes it easy to reference across workbooks and in Power Query.
Convert your PDF tables to Excel first — then use these tips to clean and analyze your data.
Start Converting for Free