Recommendation System

Production machine learning.

Recommendation System
Previous Next

Recommendation System

Production machine learning.

Recommendation System
Previous Next

When I worked as a software engineer for a small R&D company, my team was handed numerous projects with tight deadlines. We would build an MVP as quickly as possible, deliver the prototype, and then move onto the next project. The next time we wanted to unshelve one of these solutions, we always had to spend a few hours digging into the code to become reacquainted with the solution architecture.

I believe that technical teams of any sort can reap tremendous benefits by simply mapping out their processes. I like to call these maps technical product roadmaps. That is really just a fancy way to say formalized plan. The map should explain step-by-step, how a program or analysis works. The roadmap should explain your engineering process in plain English to a five year old.

We adopted this approach to help a large utility restructure their product recommendation system. The utility wanted to use a recommendation system to sell energy-efficient appliances to its customers. The data engineering team had built a recommendation system which was nearly impossible to maintain - no one person understood the entire pipeline from end-to-end. Moreover, the pipeline took upwards of 32 hours to execute.

The Kvaltis team first documented the existing system, using data-flow diagrams (specifically, directed acyclic graphs) and technical roadmaps. By mapping out the data processing pipeline, we discovered a number of redundant steps which could be eliminated from the pipeline. We also benchmarked the execution time of each step, and identified the batch processing steps which were taking the most time.

Documentation was a large part of this project’s success. Our diagrams facilitated communication with the engineering and product team, and exposed the existing solution architecture. Creating the diagrams demanded that we understand the entire data processing pipeline, and revealed several knowledge gaps within the engineering team.

Our team restructured the data processing pipeline, and created a hierarchical clustering algorithm to place the customers into groups with similar customers. The quality of the recommendations improved, and the execution time dropped from 32 hours to 15 minutes, saving our client upwards of 1,600 hours each year.