Problem
Commodity price reports were available in PDF form, but the formats were inconsistent and unsuitable for direct analysis. Scraping approaches were unreliable, and historical recovery needed a more deterministic ingestion strategy.
Project
A production-focused ETL workflow that gathers daily CBSL commodity reports, parses difficult PDF layouts, and publishes clean structured food-price data for analysis and downstream use.
Timeline
March 2026 - April 2026
Outcome
Recovered 1,200+ days of historical data and shipped an automated dataset pipeline with 75+ downloads in two weeks.
Commodity price reports were available in PDF form, but the formats were inconsistent and unsuitable for direct analysis. Scraping approaches were unreliable, and historical recovery needed a more deterministic ingestion strategy.