October 14, 2018
Let’s explore how we can put a machine learning model into production. I’m going to use the 90s pop lyrics generator for this task. Inference with this nice little Keras model isn’t too computationally intensive. A CPU can be used to perform inference. This will make productionising a lot simpler and cost-efficient. We should define what we mean by “production”. My goal isn’t to have people use the model, but rather, build a prototype in which the model can be easily shown.
So putting the model into production means there:
Scaling, concurrency, security, and performance is not important to me at this stage.
Initially, I attempted to use CoreML to run the model client-side. That is, on an iOS device, without making network calls. This was attractive as it would reduce cost and reduce system complexity. Another consideration is data privacy. Running the model on the device means that data never has to leave the device. The alternative to running the model client-side is running the model server-side. In this case, I’d need to host the model somewhere. This would require a service to allow the client to access the model, and introduce hosting fees.
Unfortunately, the model’s inference stage wasn’t constructed in a way that would play nicely with CoreML. Because of this, I opted to run the model server-side.
Ok, here’s the plan. The model is going to get wrapped up and served by a REST API. The REST API will take network calls, make a prediction with the model, then respond with a prediction (generate lyrics). This API will be hosted in the cloud rather than a local machine to enable better accessibility and uptime.
The client can be a web application, but let’s go with an iOS application instead. It’s much easier to show off a mobile application than a web application. The iOS application will make network calls to the API and render the response to the user. Lastly, a Watch application will communicate with the iOS application to provide a secondary method of making predictions.
In its current state, the model is one big script. Creation of the dataset, definition of the model, training the model, and making predictions with the model occur procedurally. This is not ideal. It’s always good to separate systems into discrete components. This practice helps in reducing errors and increasing composability. Before we can use the 90s Pop Lyrics Generator model, we are going to perform a few abstraction tasks.
First, extract the corpus and character set construction. This will become its own script and will export the generated assets. We don’t want to re-train the model each time we want to use it, so export the model after it’s trained. We can also extract the inference step into its own script. This script should load the model. It should also take 2 parameters, a sample input and the number of characters to be generated.
The abstracted 90s Pop Lyrics Generator can be used with the following steps:
1. Obtain the required dataset 2. Place it at data/dataset.csv 3. Run the create_corpus.py to generate the corpus. 4. Run the model.py to train the model. 5. Run the generate_lyrics.py "some input" 500 'model/100000ex-128a-50b-20c-2018-09-19 19:41:58.hdf5' 'data/charset.csv' generate lyrics. - "some input": the starting string - 500: the number of chars to generate - 'model/100000ex-128a-50b-20c-2018-09-19 19:41:58.hdf5' being the model file - 'data/charset.csv' being the charset array file
To construct the REST API, I used Flask, a microframework for Python, and Swagger, an open source tool for delivering great APIs. These were chosen because the model was written in Python. Swagger was chosen because it allows for generating an API with the OpenAPI specification and documentation. Keep it simple right?
Essentially, an application is created and listens on port 5000 in
server.py. Swagger specifies the handling of POST calls to
swagger.yml. It then parses parameters and responding by calling
model.py is a ported version of model inference. I was unable to persist the loaded model between network calls. Solving this would reduce resource usage and should be investigated in future scaling.
# swagger.yml swagger: "2.0" info: description: This is the swagger file that goes with our server code version: "1.0.0" title: Swagger REST Article consumes: - "application/json" produces: - "application/json" basePath: "/api" # Paths supported by the server application paths: /model: post: operationId: model.create tags: - model summary: Create a new set of lyrics with the model description: Create a new set of lyrics with the model parameters: - name: param in: body description: Starting sample used to create required: True schema: type: object properties: n_chars: type: integer description: Number of characters to generate sample: type: string description: Starting sample used to create lyrics responses: 201: description: Successfully made a prediction
The API can be tested with:
$ python server.py.
One can view the API documentation website at
localhost:5000/api/ui. API calls can be tested from this website. Swagger is the best.
Let’s Dockerize our new REST API so that we can deploy it in Docker container. Why? To make it as portable as possible and to enable high reproducibility! Start by gathering Python dependances in a
# requirements.txt connexion flask keras numpy tensorflow
Next, create a
Dockerfile which provides step-by-step instructions on how the container should be constructed. Start with an Ubuntu 16.04 Dockerfile from Docker Hub. Then install python3, pip3, and python dependances. Finally, we expose the port 5000 and start the API.
# Dockerfile FROM ubuntu:16.04 RUN apt-get update && apt-get install -y \ python3 \ python3-pip RUN pip3 install --upgrade \ pip \ setuptools WORKDIR /app COPY requirements.txt ./ RUN pip install --no-cache-dir -r requirements.txt COPY . . EXPOSE 5000 CMD ["python3", "./server.py"]
Test the Dockerfile by building it into an image:
$ docker build -t ml_api_image .
Make sure the image runs as a container:
$ docker run -it --rm --name ml_api_container -p 5000:5000 ml_api_image
We are going to need to choose a cloud to run the Dockerized REST API. I picked Digital Ocean since they offer compute resources for $5. That’s the best pricing I’ll probably find for this toy project. There is even an option to one-click install Docker. This makes it even easier to set up the API.
Before we go any further, make sure you perform a basic server setup. Be sure to allow traffic through port
ufw. Once the server is set up, clone the API git repository to it. Finally, build the docker image then run a container with it.
Alongside the previously mentioned tools, I used Redux and Redux Saga for state management. Using this application, the user can specify a starting sample and the number of characters that should be generated. A network call to the REST API is made, and the resulting generated lyrics is rendered back to the user. Fantastic.
Bonus, we were able to hit 90%+ test coverage. Heck yuh.
For the Watch application, we are going to create an extension to the iOS Application. It won’t be written with React Native since this target isn’t available. Instead, it’ll be written in Swift and will communicate with the iOS Application via React Native Watch Connectivity. This allows us to send messages to and from the iOS and Watch applications. The Watch application is able to dispatch parameters, be sent a prediction, and render it to the user.
As a bonus, we are able to use Siri or Touch from the Watch to set the sample input. A very neat party trick.
We’ve done a lot but are still lacking in the user experience and design! The next steps are to handle these very important components. Stay tuned as I work on implementing these.
Peace. Love. Spice up your life.
Written by Peter Chau, a Canadian Software Engineer building AIs, APIs, UIs, and robots.