Blog › Data Cleaning Techniques in Excel

Data Cleaning Techniques in Excel: From Messy to Analysis-Ready

Data Quality • 9 min read • March 2026

Data professionals estimate that 60% to 80% of analysis time is consumed by data preparation and cleaning — not the analysis itself. This proportion is frustratingly high, but it reflects reality: data from the real world arrives inconsistently formatted, incompletely filled, and structured for entry rather than analysis. Whether the source is a PDF conversion, a database export, a web scrape, or a manually maintained spreadsheet, cleaning is almost always required.

These techniques address the most common data quality problems encountered in business data work, with particular attention to issues that arise when working with PDF-converted data.

1. Remove Extra Whitespace with TRIM

Whitespace is the most common hidden data quality problem. Leading spaces, trailing spaces, and double spaces within strings all look the same visually but cause VLOOKUP mismatches, COUNTIF undercounts, and join failures. The TRIM function removes all leading and trailing spaces and replaces internal multiple spaces with single spaces.

Apply TRIM to any text column: =TRIM(A2)

After conversion from PDF, text cells frequently contain extra spaces because the PDF extraction picks up spacing characters that separate text elements on the page. Run TRIM on all text columns as the first cleaning step. Once cleaned, copy-paste as values to replace the formula with the clean text.

For non-breaking spaces (ASCII character 160, common in some PDFs), TRIM does not remove them. Use =TRIM(SUBSTITUTE(A2, CHAR(160), " ")) to replace non-breaking spaces with regular spaces first.

2. Standardize Text Case with UPPER, LOWER, PROPER

Inconsistent capitalization — "ACME Corp", "Acme corp", "acme CORP" — creates duplicates in grouped analyses that appear as separate entities. Standardize with one of three functions:

  • =UPPER(A2) — all capitals, good for codes and identifiers
  • =LOWER(A2) — all lowercase, useful for email addresses and normalized comparison
  • =PROPER(A2) — title case (first letter of each word capitalized), appropriate for names and company names

Choose the format that matches your standard and apply consistently. After applying the function, paste as values to convert formulas to clean text, then delete the original column.

3. Convert Text Numbers to Numeric Values

Numbers stored as text — indicated by the green triangle warning in the cell corner — cannot be summed, averaged, or compared numerically. This happens frequently with PDF-converted data where currency symbols, commas, or percentage signs are embedded in the extracted values.

Several approaches depending on the cause:

  • Quick fix for small datasets: Select the column, click the yellow warning diamond, choose "Convert to Number"
  • Currency symbols: Find & Replace to remove the symbol (€, $, £), then Convert to Number
  • Commas as thousands separators: Find & Replace "," with "" (nothing) to remove commas, then Convert to Number
  • Formula approach: =VALUE(SUBSTITUTE(SUBSTITUTE(A2,"€",""),",","")) removes the euro symbol and commas and converts in one step

After conversion, verify by applying a SUM function to the column. Text numbers return 0 in SUM; if your SUM matches expectations, the conversion was successful.

4. Standardize Date Formats

Dates from PDF conversions can arrive in almost any format: 15/03/2026, 03-15-2026, 15-Mar-26, March 15 2026, 20260315. Excel may recognize some of these as dates and store them as date serial numbers; others it will leave as text strings that cannot be sorted chronologically or used in date calculations.

Test whether Excel has recognized a date by formatting the cell as a number. A number like 46095 is a valid Excel date serial number; text like "15/03/2026" has not been recognized as a date.

For unrecognized date text strings, use DATEVALUE to convert them. DATEVALUE works on strings that look like dates in any standard format. For non-standard formats (like YYYYMMDD), parse with string functions:

=DATE(LEFT(A2,4), MID(A2,5,2), RIGHT(A2,2)) converts YYYYMMDD string to an Excel date.

Once dates are recognized as Excel dates, format consistently (Format Cells → Date → choose your preferred display format) and they will sort correctly in any analysis.

5. Remove Duplicates Systematically

Duplicate rows corrupt aggregations and lead to double-counting. Excel's built-in Remove Duplicates (Data tab) handles straightforward cases, but business data often requires more nuanced duplicate detection:

  • Exact duplicates: Data tab → Remove Duplicates, select all columns. Removes rows where every column matches.
  • Duplicate by key field: Remove Duplicates selecting only the key column (e.g., Invoice Number) — keeps the first occurrence when any duplicate key is found.
  • Fuzzy duplicates (similar but not identical): Use a helper column with a normalized form of the key field (TRIM, UPPER, SUBSTITUTE spaces removed) and then remove duplicates based on that helper column.

Before removing duplicates, always check a sample of what will be removed. True duplicates should be identical; if the records differ in substantive ways, they may not be genuine duplicates and should be investigated before deletion.

6. Fill Blank Cells in Merged Groups

PDF tables with grouped rows often have blank cells in the group label column — the label appears once at the top of a group and the remaining rows in the group have blank cells. After conversion, these blank cells prevent filtering and grouping from working correctly.

Fill down technique: select the column, use Go To Special (Ctrl+G → Special → Blanks) to select all blank cells. In the formula bar, type =A2 (the cell above the first blank) and press Ctrl+Enter. This fills all blank cells with the value from the row above. Paste as values to convert formulas to static text.

This is one of the most common cleanup tasks for PDF-converted data and takes less than a minute with this technique even on large datasets.

7. Split Combined Columns

PDF extraction sometimes merges data that should be in separate columns — first and last name in one cell, address components in one column, product code and description combined. Split these using:

  • Text to Columns (Data tab): Split on a delimiter (comma, space, dash) or at a fixed character position. Best for consistent patterns.
  • Flash Fill (Ctrl+E): Show Excel the pattern in the first cell; it fills the rest automatically. Best for pattern-based extraction without a consistent delimiter.
  • String formulas: LEFT, RIGHT, MID, FIND, SEARCH for precise control over extraction when patterns are consistent but complex.

8. Validate Data with Conditional Formatting

After cleaning, apply conditional formatting to validate the result visually before analysis. Create rules that highlight cells that still fail quality checks:

  • Blank cells in required columns: =ISBLANK(A2) — highlight red
  • Numbers below zero where only positive values expected: =A2<0 — highlight orange
  • Dates in the future where historical data expected: =A2>TODAY() — highlight yellow
  • Text in numeric columns: =NOT(ISNUMBER(A2)) — highlight red

Any highlighted cells need investigation before the data is used in analysis. This visual quality check catches issues that would otherwise silently corrupt results.

9. Document Your Cleaning Steps

Data cleaning decisions are substantive analytical choices — particularly decisions about how to handle ambiguous cases, outliers, and missing data. Document these decisions in a separate "Data Notes" sheet in the workbook.

Record: the original source and date of extraction, each cleaning step applied, any cells that were manually corrected and why, any records excluded and the reason for exclusion, and the date the cleaning was performed and by whom. This documentation enables reproducibility, supports audit trails, and helps future analysts understand why the data looks the way it does.

Get Clean Data Faster Starting with the Source

PDF to Excel conversion that preserves table structure gives you better raw material for cleaning. Try our converter for free.

Convert PDF to Excel for Free

Related Articles and Guides