How to OCR a PDF: Extract Text from Scanned Documents
A scanned PDF is essentially a collection of images — you can see the text, but you can't select, copy, or search it. OCR (Optical Character Recognition) converts those images into actual text, making your scanned documents searchable, copyable, and editable. Here's how to do it for free.
First: Is Your PDF Actually Scanned?
Not all PDFs need OCR. Try selecting text in your PDF:
- Text highlights when you click and drag? → Your PDF already has text. No OCR needed. You can copy/paste directly.
- Nothing happens when you try to select? → It's a scanned/image PDF. You need OCR to extract the text.
- Some pages are selectable, others aren't? → It's a mixed PDF. You need OCR for the scanned pages only.
Method 1: Screenshot Pages and Use Online OCR
The simplest free approach for a few pages:
- Open your PDF and take a screenshot of each page (or save pages as images)
- Open Tools Oasis Image to Text
- Upload each page image and extract the text
- Copy the extracted text into your document
Best for: 1-5 pages where you need the text content (not the PDF formatting). Completely free, completely private — nothing leaves your browser.
Try It Free — Your Data Stays PrivateMethod 2: Google Drive (Free, Cloud-Based)
Google Drive has a hidden OCR feature that most people don't know about:
- Upload your scanned PDF to Google Drive
- Right-click the PDF and select "Open with" > "Google Docs"
- Google automatically runs OCR and converts the PDF into a Google Doc with editable text
- Review the text and fix any OCR errors
Pros: Free, handles multi-page PDFs, reasonably accurate. Cons: File is uploaded to Google's servers, formatting is often lost, images may appear distorted in the doc.
Method 3: Adobe Acrobat (Best Quality, Paid)
Adobe Acrobat Pro is the gold standard for PDF OCR:
- Open the scanned PDF in Acrobat Pro
- Go to Tools > Scan & OCR > Recognize Text
- Select pages to process and choose your language
- Acrobat adds an invisible text layer over the scanned images
The result is a PDF that looks identical to the original but is fully searchable and selectable. This is the best approach for archiving important documents.
Method 4: Free Desktop Software
OCRmyPDF (Free, Open Source)
For batch processing on your own computer, OCRmyPDF is excellent:
- Install via command line:
pip install ocrmypdf - Run:
ocrmypdf input.pdf output.pdf - It adds a text layer to each page while preserving the original images
OCRmyPDF uses Tesseract under the hood and supports 100+ languages. It's the best free option for processing many PDFs.
NAPS2 (Windows, Free)
NAPS2 (Not Another PDF Scanner) is a free Windows tool that combines scanning and OCR. If you're scanning documents directly, NAPS2 can apply OCR during the scan process and create searchable PDFs automatically.
After OCR: Optimize Your PDF
OCR can increase your PDF file size because it adds a text layer on top of the images. After processing:
- Use Tools Oasis PDF Compressor to reduce the file size while keeping the searchable text
- If you have multiple scanned documents, use Tools Oasis PDF Merge to combine them into a single organized file
Tips for Better PDF OCR Results
- Scan at 300 DPI minimum — Lower resolutions significantly reduce OCR accuracy. 300 DPI is the standard; 600 DPI is better for small text.
- Scan in grayscale — Unless you need color, grayscale scans produce better OCR results and smaller files than color.
- Straighten skewed pages — Most OCR tools handle slight skew, but significantly rotated or tilted pages reduce accuracy.
- Clean the scanner glass — Dust and smudges create visual noise that can confuse OCR.
- Process one language at a time — If your document has multiple languages, specify the primary language for best results.
Frequently Asked Questions
Will OCR preserve my PDF's original layout?
Tools like Adobe Acrobat and OCRmyPDF add an invisible text layer over the original images, preserving the visual layout perfectly. Google Drive's method, however, tries to convert the layout into a document format, which often breaks formatting.
How accurate is PDF OCR?
For cleanly scanned printed text at 300+ DPI, expect 97-99% accuracy. Poor scans, handwriting, or unusual fonts will reduce accuracy. Always proofread important documents after OCR.
Can I OCR a password-protected PDF?
You need to remove the password protection first. If you have the password, open the PDF, remove the restriction, save it, then apply OCR.