The Challenge
Manual data processing from multiple sources takes a long time and takes away the necessary time needed to expand the analytics services and additional value to their clients. ETL processes have not been clearly designed as the product was pushed to market, thus issues with time and quality of data for ingestion are still present. Additionally, normalizing public data records from multiple states and title companies without data governance rules present multiple challenges.
How It Was Solved
The company partnered with Sphere to automate as many manual processes as possible in eight weeks including parsing and normalization. The proposed approach was driven by designing a solution using AWS glue to control data rules and governance, calling Python routines that improved parsing and all data (raw and clean) stored in AWS S3 and Athena as data lakes.
The Results
Reduce manual batch processing of parsing and normalization rules from weeks to a few hours, plus designing a new data store with reduced data (28% less records) to improve performance of the queries and response time to clients on their Tableau user interface. The next phase will consist of creating a real data warehouse type of data store in AWS Redshift to improve analytics and performance for their clients.