Groundwater discovery is a complex challenge, especially in regions like Sudan where communities rely on it for survival. Decades of geological surveys and field reports hold vital data, but remain locked away in unorganized archives. MapAid, a non-profit focused on AI-enhanced mapping for humanitarian aid, partnered with Databricks for Good to unlock this information.
The initiative transformed nearly 700 scanned hydrogeological documents into a searchable database, a crucial step for MapAid's WellMapr app, which guides low-cost well drilling. The project leveraged multimodal AI for document analysis, turning static archives into an actionable search engine.
Visualizing Old Documents
The archive presented significant hurdles: scanned documents, some decades old, lacked embedded text. Pages were skewed, contained mixed languages (English and Arabic), and included handwritten notes. Traditional OCR was insufficient.
The team reframed the problem as visual understanding. Scanned page images were fed directly into multimodal AI models. This approach, detailed on the Databricks blog, allowed the AI to interpret content visually.
Pages were rendered as images and stored in Unity Catalog Volumes. An intelligent sampling strategy reduced processing costs by over 70%, focusing on key sections of longer documents.
Databricks AI Functions were used to analyze each sampled page. The model identified Dewey Decimal codes, referenced Sudanese geographies, and flagged pages relevant to water resources.