Laboratory One Research Blog

Productionizing a Machine Learning Model

October 14, 2018

System Diagram

Let’s explore how we can put a machine learning model into production. I’m going to use the 90s pop lyrics generator for this task. Inference with this nice little Keras model isn’t too computationally intensive. A CPU can be used to perform inference. This will make productionising a lot simpler and cost-efficient. We should define what we mean by “production”. My goal isn’t to have people use the model, but rather, build a prototype in which the model can be easily shown.

So putting the model into production means there:

  • should be a user interface where inputs can be given
  • should be a user interface where a response is rendered.
  • the model should be set up to run on the internet.

Scaling, concurrency, security, and performance is not important to me at this stage.

Initially, I attempted to use CoreML to run the model client-side. That is, on an iOS device, without making network calls. This was attractive as it would reduce cost and reduce system complexity. Another consideration is data privacy. Running the model on the device means that data never has to leave the device. The alternative to running the model client-side is running the model server-side. In this case, I’d need to host the model somewhere. This would require a service to allow the client to access the model, and introduce hosting fees.

Unfortunately, the model’s inference stage wasn’t constructed in a way that would play nicely with CoreML. Because of this, I opted to run the model server-side.

System Architecture

Ok, here’s the plan. The model is going to get wrapped up and served by a REST API. The REST API will take network calls, make a prediction with the model, then respond with a prediction (generate lyrics). This API will be hosted in the cloud rather than a local machine to enable better accessibility and uptime.

The client can be a web application, but let’s go with an iOS application instead. It’s much easier to show off a mobile application than a web application. The iOS application will make network calls to the API and render the response to the user. Lastly, a Watch application will communicate with the iOS application to provide a secondary method of making predictions.

System Diagram

Exporting the model

In its current state, the model is one big script. Creation of the dataset, definition of the model, training the model, and making predictions with the model occur procedurally. This is not ideal. It’s always good to separate systems into discrete components. This practice helps in reducing errors and increasing composability. Before we can use the 90s Pop Lyrics Generator model, we are going to perform a few abstraction tasks.

First, extract the corpus and character set construction. This will become its own script and will export the generated assets. We don’t want to re-train the model each time we want to use it, so export the model after it’s trained. We can also extract the inference step into its own script. This script should load the model. It should also take 2 parameters, a sample input and the number of characters to be generated.

Abstracted 90s Pop Lyrics Generator repository

The abstracted 90s Pop Lyrics Generator can be used with the following steps:

1. Obtain the required dataset
2. Place it at data/dataset.csv
3. Run the to generate the corpus.
4. Run the to train the model.
5. Run the "some input" 500 'model/100000ex-128a-50b-20c-2018-09-19 19:41:58.hdf5' 'data/charset.csv' generate lyrics.
- "some input": the starting string
- 500: the number of chars to generate
- 'model/100000ex-128a-50b-20c-2018-09-19 19:41:58.hdf5' being the model file
- 'data/charset.csv' being the charset array file

Constructing a REST API

To construct the REST API, I used Flask, a microframework for Python, and Swagger, an open source tool for delivering great APIs. These were chosen because the model was written in Python. Swagger was chosen because it allows for generating an API with the OpenAPI specification and documentation. Keep it simple right?

Essentially, an application is created and listens on port 5000 in Swagger specifies the handling of POST calls to /model in swagger.yml. It then parses parameters and responding by calling is a ported version of model inference. I was unable to persist the loaded model between network calls. Solving this would reduce resource usage and should be investigated in future scaling.

# swagger.yml

swagger: "2.0"
  description: This is the swagger file that goes with our server code
  version: "1.0.0"
  title: Swagger REST Article
  - "application/json"
  - "application/json"

basePath: "/api"

# Paths supported by the server application
      operationId: model.create
        - model
      summary: Create a new set of lyrics with the model
      description: Create a new set of lyrics with the model
        - name: param
          in: body
          description: Starting sample used to create
          required: True
            type: object
                type: integer
                description: Number of characters to generate
                type: string
                description: Starting sample used to create lyrics
          description: Successfully made a prediction

The API can be tested with: $ python

API Running

One can view the API documentation website at localhost:5000/api/ui. API calls can be tested from this website. Swagger is the best.

API Documentation

API Parameters

API Response

90s Pop Lyrics Generator API

Dockerizing the API

Let’s Dockerize our new REST API so that we can deploy it in Docker container. Why? To make it as portable as possible and to enable high reproducibility! Start by gathering Python dependances in a requirements.txt.

# requirements.txt


Next, create a Dockerfile which provides step-by-step instructions on how the container should be constructed. Start with an Ubuntu 16.04 Dockerfile from Docker Hub. Then install python3, pip3, and python dependances. Finally, we expose the port 5000 and start the API.

# Dockerfile

FROM ubuntu:16.04

RUN apt-get update && apt-get install -y \
    python3 \

RUN pip3 install --upgrade \
    pip \


COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

COPY . .


CMD ["python3", "./"]

Test the Dockerfile by building it into an image: $ docker build -t ml_api_image .

Make sure the image runs as a container: $ docker run -it --rm --name ml_api_container -p 5000:5000 ml_api_image

Setting up a Server

We are going to need to choose a cloud to run the Dockerized REST API. I picked Digital Ocean since they offer compute resources for $5. That’s the best pricing I’ll probably find for this toy project. There is even an option to one-click install Docker. This makes it even easier to set up the API.

Before we go any further, make sure you perform a basic server setup. Be sure to allow traffic through port 5000 with ufw. Once the server is set up, clone the API git repository to it. Finally, build the docker image then run a container with it.

Building an iOS Application

For the clients, we are going to start with the iOS Application. It’s going to be built with React Native, a framework for building native applications with React. Although it’d be simpler to build with Swift, I wanted to learn Flow, a static type checker, and practice unit testing with Jest, a JavaScript test runner. Additionally, I might want to port this application to the web so the high code reuse provided by React is nice.

Alongside the previously mentioned tools, I used Redux and Redux Saga for state management. Using this application, the user can specify a starting sample and the number of characters that should be generated. A network call to the REST API is made, and the resulting generated lyrics is rendered back to the user. Fantastic.

iOS Application

iOS Application Prediction

Bonus, we were able to hit 90%+ test coverage. Heck yuh.

iOS Application Test Coverage

Building an iOS Watch Application

For the Watch application, we are going to create an extension to the iOS Application. It won’t be written with React Native since this target isn’t available. Instead, it’ll be written in Swift and will communicate with the iOS Application via React Native Watch Connectivity. This allows us to send messages to and from the iOS and Watch applications. The Watch application is able to dispatch parameters, be sent a prediction, and render it to the user.

Watch Application

Watch Application Input

Watch Application Predictions

As a bonus, we are able to use Siri or Touch from the Watch to set the sample input. A very neat party trick.

Watch Application Siri Input

Watch Application Touch Input

Next Steps

We’ve done a lot but are still lacking in the user experience and design! The next steps are to handle these very important components. Stay tuned as I work on implementing these.

Peace. Love. Spice up your life.

Peter Chau

Written by Peter Chau, a Canadian Software Engineer building AIs, APIs, UIs, and robots.