Introducing Cog and Containerizing Machine Learning Models

May 23, 2023 by Janakiram MSV
Introducing Cog and Containerizing Machine Learning Models

Deploying and Scaling Containerized Machine Learning Models – Part 1

This four-part series focuses on leveraging Cog and Acorn frameworks to package, deploy, and scale machine learning models in cloud native environments. Part 1 introduces Cog as the framework for containerizing ML models, while part 2 focuses on integrating Cog with Acorn to target Kubernetes clusters for deployment. Part 3 discusses leveraging GPUs to accelerate the inference of computer vision models, and finally, part 4 covers deploying transformer models that deal with natural language processing (NLP).

Machine learning models have two lives – training and inference. We need infrastructure powered by high-end CPUs and GPUs to train complex models, which is typically done in the cloud or an enterprise data center. Once a model is trained, it is packaged and deployed as an API for applications to consume them. The inference API is exposed as a REST or gRPC endpoint. With Kubernetes gaining ground, inference API is deployed as workloads running within the cluster. Like deploying software binaries, ML engineers and DevOps teams collaborate to package and deploy the model artifacts.

The workflow of packaging and deploying ML artifacts involves assembling the model, and the inference code into a container image, packaging that as a Kubernetes deployment, and exposing it as a service. DevOps teams are responsible for gathering all the dependencies needed by the inference code along with appropriate frameworks and runtimes such as TensorFlow and CUDA. Any mismatch between the inference code, the dependencies, versions of frameworks, and the runtime environment leads to a laborious and expensive debugging process.

AC-1.png

Cog is an open source project from Replicate.ai, a startup offering an ML model hosting service. It simplifies and automates the process of containerizing an ML model and all the required entities into an OCI image.

Like Acorn, Cog takes a declarative approach to build container images for inference. It automates the generation of Dockerfile along with all the required dependencies, frameworks, and runtime requirements. The container image has an in-built FASTAPI HTTP server to expose REST endpoints. Refer to Cog’s documentation for more details.

To appreciate the power of Cog, let’s go through an end-to-end example of training a model and using it for inference. We will then use Cog to package the model and inference code.

Training a simple machine learning model

Since the objective is to understand the workflow and not train complex models, we will build a simple Scikit-learning model that predicts the salary based on the number of years of experience. It’s a linear regression model trained on a tiny toy dataset.

For this tutorial, you need an x86 machine with Python 3.8 or above installed. You also need Docker to build the container image.

Start by cloning the GitHub repository with the dataset, code to train the model, and the inference API based on Flask.

git clone https://github.com/janakiramm/Salary.git && cd Salary

Create a Python virtual environment, activate it, and install the dependencies.

python3 -m virtualenv salenv source salenv/bin/activate pip3 install -r requirements.txt

The CSV file under the data directory acts as the dataset. It has two columns representing years of experience and salary.

cat data/sal.csv x,y 0,103100 1,104900 2,106800 3,108700 4,110400 5,112300 6,114200 7,116100 8,117800 9,119700 10,121600 11,123300 12,125200 13,127100 14,128900 15,130700 16,132600 17,134400 18,136300 19,138000 20,139900

The file train.py under the train directory has the code to train and generate the model.

import numpy as np import pandas as pd import os import joblib from argparse import ArgumentParser from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression parser = ArgumentParser() parser.add_argument("-i", "--in", dest="input", help="location of input dataset") parser.add_argument("-o", "--out",dest="output", help="location of model" ) dataset = parser.parse_args().input model_dir = parser.parse_args().output sal = pd.read_csv(dataset,header=0, index_col=None) X = sal[['x']] y = sal['y'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=10) lm = LinearRegression() lm.fit(X_train.values,y_train.values) print('Intercept :', round(lm.intercept_,2)) print('Slope :', round(lm.coef_[0],2)) from sklearn.metrics import mean_squared_error y_predict= lm.predict(X_test.values) mse = mean_squared_error(y_predict,y_test) print('MSE :', round(mse,2)) if not os.path.exists(model_dir): os.makedirs(model_dir) filename = model_dir+'/model.pkl' joblib.dump(lm, filename,compress=1)

The code accepts the dataset’s location and dumps the serialized model in the target directory. Let’s train and generate the model.

python3 train.py -i ../data/sal.csv --out ../model

AC-2.png

The serialized model is saved to the model directory.

Performing inference on the trained model

Let’s now write code to load the model and expose it as a REST endpoint. You can find the inference code under the deploy directory.

from flask import Flask, jsonify from flask import request import joblib from argparse import ArgumentParser parser = ArgumentParser() parser.add_argument("-m", "--model", dest="model", help="location of the pickle file") filename = parser.parse_args().model app = Flask(__name__) @app.route('/') def index(): return "Stackoverflow Salary Predictor" @app.route('/predict', methods=['POST']) def predict(): loaded_model=joblib.load(filename) x=int(request.json["exp"]) y=loaded_model.predict([[x]])[0] sal=jsonify({'salary': round(y,2)}) return sal if __name__ == '__main__': app.run(host='0.0.0.0', port=8080)

The above program accepts the location of the model and generates a REST endpoint to perform predictions.

AC-3-980x267.png

Let’s invoke the API through a cURL request.

curl -X POST -H "Content-type: application/json" -d "{"exp":"25"}" http://localhost:8080/predict

AC-4.png

The above screenshot shows that our inference code is working as expected. For an individual with 25 years of experience, the salary is approximately 149,000.

Focus on the function predict, which is the essence of the inference code. It first loads and deserializes the model artifact and then parses the input payload to extract the variable representing the experience. It then passes the variable to the model to derive the prediction, which is sent as a JSON response to the HTTP request.

Though this is a trivial example, the workflow remains the same. The next step is to define a Dockerfile and build the image. You can find many examples and best practices for building Python-based container images.

This tutorial will not cover building the container image. Instead, it shows how Cog framework simplifies the process of packaging the container image for inference.

Using Cog to build the container image for inference

Cog needs three files – cog.yaml, predict.py, and the model artifact. The cog.yaml file contains the version of python, the core dependencies, GPU requirements, and the function responsible for invoking the model. The function lives in an external Python file that contains the preprocessing logic and the actual invocation of the model.

AC-X.png

Let’s start by installing the Cog CLI tool.

brew install cog

Check the version of Cog to verify the installation.

AC-5.png

Now is the time to create cog.yaml. Create a directory called inference that contains the files related to Cog.

If you run cog init, the CLI generates the template that acts as the starting point.

build: python_version: "3.8" python_packages: - numpy==1.21.6 - scipy==1.7.3 - scikit-learn==1.0.2 - pandas==1.3.5 - joblib==1.2.0 predict: "predict.py:Predictor"

The build section declares the version of Python. The next section has all the dependencies that typically go into the requirements.txt file. Finally, it has the name of the Python file and the function responsible for the prediction.

Let’s create the prediction.py, which is essential for inference.

from typing import Any from cog import BasePredictor, Input, Path import joblib class Predictor(BasePredictor): def setup(self): """Load the model into memory to make running multiple predictions efficient""" self.model=joblib.load("model.pkl") # The arguments and types the model takes as input def predict(self, exp: int = Input(description="Experience in years")) -> Any: """Run a single prediction on the model""" sal=self.model.predict([[exp]]) return round(sal[0],2)

Pay attention to the predict function. It’s similar to the code we wrote in the inference.py. To avoid loading the model for each request, we load it and deserialize it once, which is done within the setup function.

Before building the image, let’s copy the serialized model into the inference directory.

cp ../model/model.pkl

We have three files needed by Cog to build the image successfully,

AC-6.png

We can now perform inference with a single command:

cog predict -i exp=25

AC-8-980x985.png

Let’s kick off the image-building process. We can pass a tag that helps push to the registry later. As you can see, Cog built the image and then passed the input to perform the prediction.

Now that the code is working, we are ready to build the image.

export DOCKER_HUB_USERNAME=janakiramm cog build -t $DOCKER_HUB_USERNAME/salpred

This builds a Docker image with the mentioned tag.

AC-7-1024x233.png

We can run the image with Docker CLI to test the REST endpoint.

docker run --rm -d --name salpred -p 5000:5000 janakiramm/salpred

AC-9-980x202.png

To extract just the prediction, we can use the jq utility.

curl -X POST -H "Content-type: application/json" -d '{"input": {"exp": "25"}}' --silent http://localhost:5000

AC-10-980x136.png

If you are curious about Dockerfile built by Cog, run the below command to see explore the contents.

AC-11-980x576.png

Cog is a simple yet powerful tool to automate the build process of ML inference code.

In the next part of this series, we explore the integration of Cog and Acorn to deploy the inference service in Kubernetes. Stay tuned.