An innovative biotechnology company was looking to streamline data extraction from 1,965 multi-page agricultural PDFs to build a genetic database of plant inherent potential. Using this genetic knowledge, the company would develop “personalized seeds” by matching it with a farm’s unique growing conditions to achieve optimized field performance.
The main challenge that this company faced was due to the sheer volume of data written in narrative form. In total, 13 different fields needed to be extracted from the documents’ various formats. While most of the information was easily identified, the location of genealogy data varied between PDFs. The company emphasized that high accuracy of data along with a quick cycle time was crucial in order to successfully develop different seed variants.
By leveraging skilled people along with programmatic technology, ARDEM was able to create a process that is more effective, faster, and precise at a lower cost. An open line of communication allowed for any questions to be cleared up quickly allowing for a smooth completion of the final data set.
The Head of Digital Technologies remarked that no errors were detected in the final dataset and it was easily uploaded into their current database. The plant breeding technology company was extremely satisfied with the result and was eager to call on ARDEM for any similar projects in the future!