Guides › Fix Broken Tables After PDF Conversion

How to Fix Broken Tables After PDF to Excel Conversion

Advanced Guide • 8 min read • Updated March 2026

Why Tables Sometimes Come Out Wrong

Even the best PDF to Excel converters occasionally produce output that needs some cleanup. This is not a failure of the tool — it is an inherent consequence of the PDF format, which does not store table structure explicitly. The converter has to infer which pieces of text belong to which rows and columns based on their positions on the page, and this inference can go wrong when PDFs are formatted in unusual ways.

The good news is that most conversion problems fall into a small number of predictable categories, each with a straightforward fix. This guide covers the most common issues and how to resolve them quickly in Excel.

Problem 1: All Data in One Column

What it looks like: Instead of separate columns for each field, all the text from a row appears jammed together in column A, with nothing in columns B, C, or D.

Why it happens: This typically occurs with whitespace-formatted tables — PDFs where columns are separated by spaces rather than actual table borders. The converter treats the entire row as a single text object rather than multiple column values.

How to fix it:

  1. Select the column containing the combined text
  2. Go to the Data tab and click "Text to Columns"
  3. Choose "Delimited" if there is a consistent separator (space, comma, pipe character)
  4. Or choose "Fixed Width" to manually set column break points by clicking on the data preview
  5. Click Finish — Excel will split the content into separate columns

If the column breaks are inconsistent — some rows have different spacing than others — you may need to use Excel formulas (LEFT, MID, RIGHT, FIND) to extract specific portions of each cell.

Problem 2: Columns Shifted or Values in Wrong Column

What it looks like: Values are present but appear in the wrong column — amounts ending up in the description column, or dates appearing in the balance column.

Why it happens: Occurs when a row in the PDF has a longer-than-usual description that overlaps with the horizontal range of the next column. The converter may assign the long text to the wrong column bucket.

How to fix it:

  1. Identify affected rows by sorting on the column that should contain amounts — rows with text in an amount column will sort to the top or bottom
  2. For a small number of affected rows, manually move the values to the correct columns
  3. For a larger number, use a formula to identify and correct systematic shifts: if every row with a description longer than 50 characters has a shift, use an IF formula to detect and reposition values

Problem 3: Numbers Stored as Text

What it looks like: Numbers are present but right-aligned text rather than left-aligned numbers, SUM formulas return 0, and a small green triangle appears in the top-left corner of each cell.

Why it happens: When the PDF contains currency symbols (€, $, £), thousands separators (. or ,), or other non-numeric characters mixed with the number, Excel stores the entire value as text.

How to fix it:

  1. Select all affected cells in the column
  2. Press Ctrl+H to open Find & Replace
  3. In "Find what," enter the currency symbol or separator to remove (e.g., €)
  4. Leave "Replace with" empty and click Replace All
  5. Repeat for any other non-numeric characters (commas used as thousands separators, etc.)
  6. Select the cleaned cells, click the green triangle warning, and choose "Convert to Number"

If parentheses are used for negative numbers — (1,250.00) — replace ( with a minus sign - and remove the closing ) before converting to numbers.

Problem 4: Rows Split Incorrectly Across Multiple Excel Rows

What it looks like: A single transaction or data row in the PDF appears as two or three rows in Excel, with only the description in the first row and values (date, amount) in the following rows.

Why it happens: Long text in one cell of the PDF wraps to a second line, and the converter treats the wrapped line as a separate row.

How to fix it:

  1. Identify the pattern — typically, continuation rows have an empty value in the date column
  2. For each pair of rows, concatenate the description columns: in a helper column, use =IF(A3="", A2&" "&B3, A3) to merge the text
  3. Filter rows where the date column is empty, note their row numbers, and delete them after copying any additional description text to the main row
  4. For large volumes of affected rows, a macro can automate this: find rows with empty date cells, append their description to the previous row, and delete the empty row

Problem 5: Missing Rows or Pages

What it looks like: The Excel output has fewer rows than the PDF, or rows from certain pages appear to be missing entirely.

Why it happens: This can happen if a table on a particular page is formatted very differently from the others — for example, a summary page with a different column layout — and the converter does not detect it as part of the same table. It can also happen if the PDF contains embedded images on certain pages.

How to fix it:

  1. Check all sheets in the Excel workbook — missing rows may have been extracted as a separate table on a different sheet
  2. Count the rows in the Excel output and compare against the PDF — note which specific page numbers seem to be missing data
  3. Open the PDF and check those specific pages for any unusual formatting or image elements
  4. If a page contains an image-based table (embedded as a picture rather than text), that data must be entered manually or via an OCR tool

Problem 6: Duplicate Header Rows

What it looks like: The column headers (Date, Description, Amount, etc.) appear not just at the top of the table, but repeated in the middle of the data at regular intervals — typically at the start of each page's worth of data.

Why it happens: PDFs often print column headers on every page for readability. The converter faithfully extracts these repeated headers as data rows.

How to fix it:

  1. Add a filter to the table (Data → Filter)
  2. Filter the date column for values that equal the header text (e.g., "Date" or "Transaction Date")
  3. Select all visible filtered rows
  4. Delete them — this removes only the header rows, not the actual data
  5. Clear the filter to see the clean data

Preventing Problems at the Source

The most effective way to minimize post-conversion cleanup is to use the best possible source PDF. Key tips:

  • Use PDFs exported directly from Excel, accounting software, or report generators — these produce clean, predictable table structures
  • Avoid PDFs that were printed from a screen (print-to-PDF of a webpage, for example) — the formatting is often irregular
  • Prefer PDFs with visible table borders over whitespace-formatted tables — bordered tables convert with much higher accuracy
  • If you control the PDF creation, format number columns consistently — use the same format for all values (no mixing of 1,000 and 1000, for example)

Try the Converter

Upload your PDF and get a structured Excel file — then use this guide if any cleanup is needed.

Start Converting for Free

More Guides