PDF Data Extractor - Automated Batch Processing | Anton Langbruttig

PDF Data Extractor

Automated batch processing tool built for a client to extract structured data from PDFs.

Description

> Batch processes multiple PDF files from a single folder
> Extracts text with layout preservation using pdfplumber
> Regex pattern matching to identify and parse dates
> Handles multi-line entries by combining split data
> Cleans and normalizes extracted data
> Outputs structured Excel spreadsheet with pandas
> 99% time reduction compared to manual data entry __

Technologies Used

PythonpdfplumberpandasRegular ExpressionsOpenPyXL