To design a system the first step is to understand the problems and the constraints. Each architecture is unique in the way it solves for these and there is usually more than one solution.

In the Part 1 (Noodle Blog/Medium) of this blog we went over the challenges associated with developing Enterprise AI applications. These problems can be mapped into 3 broad categories, which can be thought of as independent axes. When optimizing along one axis you will have impacts on the other and understanding these impacts helps us to choose the right optimization goals.

Trade-Offs – 3 Axes

The image below maps the categories in 3 axes:

  • Production Scalability (X-axis) – Thorough design is done for Model inference stages, sometimes involving rewrites of the base pipelines for high throughput applications
  • Deployment Speed (Y-axis) – Faster time to deploy for new Machine Learning (ML) pipelines post the model development stage
  • Code Quality (Z-axis) – Designing good modular, deduplicated code which is extensible and maintainable

At Noodle, we have limited end users for the AI Applications and the axis X is applicable only in terms of model inference throughput but not dominant. Therefore, we will analyze impacts across 2 axes – Y & Z.

Fig. 1: Trade-offs to consider when scaling AI applications

If we optimize for Y-axis – Deployment Speed, then post Model Development we minimize the steps as much as we can, and we end up treating our ML pipelines as black boxes which leads to the following problems:

  1. Reusability – There is a heavy code overlap between training and inference parts of a pipeline and across pipelines, leading to duplicate code.
  2. Maintainability – It becomes difficult to identify the workflow within black boxes, review code and understand the impact when external dependencies change.
  3. Traceability – If code is written as large scripts or monolithic functions when an error occurs, it’s not quite clear what caused the code to fail unless we read the entire code.
  4. Scaling – There are steps in a pipeline that can run in parallel or need to scale to achieve a certain latency requirement. Every developer tends to come up with their own multi-processing code sometimes with different libraries which are tough to manage across a cluster.
  5. Testing – Lack of interfaces makes end-to-end system tests tougher to write.

If we optimize for Z-axis – Code Quality, then we spend time in the Production Readiness stage to design a robust solution and it increases the turn-around time for ML pipelines to reach production. Some of the reasons for this are as follows:

  1. Low Domain Context – Software/ML engineers who are skilled enough to do this are lacking the domain context.
  2. Introducing New Bugs – Bugs may get introduced while refactoring code which are difficult to anticipate and trace.
  3. Ownership Dilution – For every bug or feature, a triage must be done to identify which developer to assign it to.

Design Goals

Having understood the challenges that come along with building Enterprise AI applications and the trade-offs that we will face when solving for these, it would be good to list the design goals which will help us pick the right trade-offs across the 3 axes.

  1. Enable developers with domain context to take their code to production
  2. Achieve balance between the 3 axes to offset maximum tradeoffs
  3. Abstract developers from complexity of orchestration frameworks, configurations of a distributed cluster and tool evolution
  4. Reduce time taken between Model development and Production Readiness stages – Y-axis
  5. Promote modularity and reuse of code even during experimentation – Z-axis
  6. Support for training, inference at scale with seamless switch – X-axis
  7. Promote Test Driven Development to ensure tech debt doesn’t pile up

We observed that while developers are very familiar with writing ML pipeline code in Jupyter Notebooks, when the time came to transfer this code as Python classes and Airflow DAGs, either there was not enough time or a general reluctance to express the same flow in different syntaxes.

We needed a mechanism to make the incoming ML pipeline transparent and compliant to overall architecture of deployment.

We wanted to encourage developers to write more code as modules, classes and functions such that they can be reused for multiple experiments.

Putting it all together – Atlas: A Recipe driven framework

The Atlas framework provides a declarative JSON based syntax to define ML pipelines which we call the “Recipe.” This is a representation of a Directed Acyclic Graph which is used as the basis for pipeline execution in most frameworks. This is accompanied by a Config JSON which can be used to provide features, hyper parameters, model versions etc.

A developer starts by wrapping their code in Template classes which represent each step of the Recipe. This ensures that code is modular and reusable from get go and benefits of this is realized even during training. This method ensures that standard Design Patterns such as Strategy and Builder and Design Principles such as DRY, IOC, SOLID are baked into our ML pipelines development.

The framework provides a Python-based SDK for developers which can be used to test a Recipe execution right in the development environment with no dependency on orchestration frameworks. Once tested the deployment pipelines ensure that the same code is deployable to a much robust and scalable environment for training and inference.

Atlas consists of the following components:

  • Recipe – Declarative JSON syntax to represent an ML pipeline
  • Config – Configuration JSON to capture the Features, Hyper parameters, Model versions etc.
  • Python SDK – Interpret, visualize, validate and test Recipe & Config
  • Infrastructure AS Code – Spin up components of the architecture
  • Deployment Pipelines – Tie a workflow for the above components

Let’s take the workflow diagram which was depicted in the Part 1 (Noodle Blog/Medium) blog, and add Atlas’ components Recipe and Config to it. You can see that this now addresses one of the core problems of the context switching which was highlighted earlier. Developers now can keep their mind off the architectural diversity and are not required to recreate and test flows in multiple places.

Fig. 2: Atlas Tech Stack

The framework can be leveraged for the following:

  • To run end-to-end pipeline on developers’ local machines
  • Easy visualization of the execution graph of pipelines which helps in better debugging and testing
  • Submit jobs for training on Kubeflow
  • Deploy the inference pipelines on Airflow
  • Define dependency for each pipeline and step using different Docker Images
  • Reduced dependency on Production Readiness stage, thereby eliminating code handoffs
  • Multiple ML pipelines can easily be created and executed concurrently by changing the classes in the Recipe JSON. This is helpful when building hierarchical models.

Let’s look at the workflow of development and deployment using the Atlas Framework.

Stage 1: Development

Fig. 3: Development Stage
  • Developers use Jupyter & Python to harden modeling approaches
  • Once the ML pipeline is developed a Recipe JSON is created to describe the same
  • Any configs which are needed to drive the Recipe can be checked in a separate Config JSON file
  • Code is committed to git repository
  • Docker images are updated and pushed to Docker registry

Stage 2: Model Training

Fig. 4: Training Stage
  • Jenkins is used to build the branch. During the build step Recipe is converted to Kubeflow pipeline
  • The Kubeflow pipeline is uploaded into Kubeflow
  • Developers use the Kubeflow UI to track and manage experiments and Artefacts
  • MLFlow UI can be used to compare experiments and manage models

Stage 3: Inference

Fig. 5: Inference Stage
  • Best performing models from Step 2 are promoted to production
  • ML model is registered using Model Registry and tagged
  • Git branch which has the pipeline code will be promoted as a release candidate having reference to the correct model version number
  • Code is built and the Recipe for inference is used to dynamically generate Airflow DAGs
  • All Recipe and Config files are scanned to generate multiple pipelines, if needed.
  • Model performance metrics get logged into Prometheus

Stage 4: Model Re-Training

Fig. 6: Model Retraining Stage
  • Retraining can be triggered by any of the following scenarios:
    • Availability of more data
    • Availability of more labels
    • Performance degradation of deployed models
    • Variation on trends of input features outside thresholds
  • Once trigger condition is met, the model metadata is checked, identify the Recipe versioning git, Pipeline version in Kubeflow and related config.
  • Training flow is initiated with input parameters as relevant for the given trigger
  • Evaluation of the model is performed on test set and if it passes it can be promoted to production
Fig. 7: Atlas Architecture Diagram

Summary

We have reduced the development to deployment turnaround time from months to 2 weeks using the above workflow. It has helped us deploy over 100s of ML pipelines into production which run over 1000s of containers. Management becomes critical at this scale and using Git to check in our code and a Model Registry to save our Models has proved to be very effective. We use Jenkins to tie this workflow together.

MLOps is a rapidly evolving field and many new open source and proprietary frameworks are becoming available as options. To keep the choice of spectrum wide and prevent major code refactors on switching from one service to the next, it really helps to start out with an abstraction. The gamut of MLOps is very broad and it is very difficult to find a one size fits all framework, so there is a need to be prepared for rapid adoption and experimentation. Being able to port your experiments and workflow stages to new tools and frameworks while abstracting Data Scientists & ML Engineers, provides great agility. While there are many different areas to cover when discussing MLOps such as Feature Store, Data Versioning and Model Monitoring, Testing Strategies, there are various contents available online for these topics.

In the series of blogs that follow we will keep our focus on the following topics:

  • Recipe: Gateway To Better Traceability
  • Dynamic Pipelines – Airflow + Kubeflow
  • Scaling Airflow Using K8s
  • Continuous Delivery: Going From Development To Deployment

References