PDF Data Extractor
Automated batch processing tool built for a client to extract structured data from PDFs.
Description
- > Batch processes multiple PDF files from a single folder
- > Extracts text with layout preservation using pdfplumber
- > Regex pattern matching to identify and parse dates
- > Handles multi-line entries by combining split data
- > Cleans and normalizes extracted data
- > Outputs structured Excel spreadsheet with pandas
- > 99% time reduction compared to manual data entry __
Technologies Used
PythonpdfplumberpandasRegular ExpressionsOpenPyXL
