While TensorFlow models are typically defined and trained using R or Python code, it is possible to deploy TensorFlow models in a wide variety of environments without any runtime dependency on R or Python:
TensorFlow Serving is an open-source software library for serving TensorFlow models using a gRPC interface.
CloudML is a managed cloud service that serves TensorFlow models using a REST interface.
RStudio Connect provides support for serving models using the same REST API as CloudML, but on a server within your own organization.
TensorFlow models can also be deployed to mobile and embedded devices including iOS and Android mobile phones and Raspberry Pi computers.
The R interface to TensorFlow includes a variety of tools designed to make exporting and serving TensorFlow models straightforward. The basic process for deploying TensorFlow models from R is as follows:
Train a model using the keras, tfestimators, or tensorflow R packages.
Call the export_savedmodel()
function on your
trained model to write it to disk as a TensorFlow SavedModel.
Use the serve_savedmodel()
function from the tfdeploy
package to run a local test server that supports the same REST API as
CloudML and RStudio Connect.
Deploy your model using TensorFlow Serving, CloudML, or RStudio Connect.
Begin by installing the tfdeploy package from CRAN as follows:
To demonstrate the basics, we’ll walk through an end-to-end example that trains a Keras model with the MNIST dataset, exports the saved model, and then serves the exported model locally for predictions with a REST API. After that we’ll describe in more depth the specific requirements and various options associated with exporting models. Finally, we’ll cover the various deployment options and provide links to additional documentation.
We’ll use a Keras model that recognizes handwritten digits from the MNIST dataset as an example. MNIST consists of 28 x 28 grayscale images of handwritten digits like these:
The dataset also includes labels for each image. For example, the labels for the above images are 5, 0, 4, and 1.
Here’s the complete source code for the model:
library(keras)
# load data
c(c(x_train, y_train), c(x_test, y_test)) %<-% dataset_mnist()
# reshape and rescale
x_train <- array_reshape(x_train, dim = c(nrow(x_train), 784)) / 255
x_test <- array_reshape(x_test, dim = c(nrow(x_test), 784)) / 255
# one-hot encode response
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)
# define and compile model
model <- keras_model_sequential()
model %>%
layer_dense(units = 256, activation = 'relu', input_shape = c(784),
name = "image") %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dense(units = 10, activation = 'softmax',
name = "prediction") %>%
compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)
# train model
history <- model %>% fit(
x_train, y_train,
epochs = 35, batch_size = 128,
validation_split = 0.2
)
In R, it is easy to make predictions using the the trained model and
R’s predict
function:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0 0 0 0 1 0 0
[2,] 0 0 1 0 0 0 0 0 0 0
[3,] 0 1 0 0 0 0 0 0 0 0
[4,] 1 0 0 0 0 0 0 0 0 0
Each row represents an image, each column represents a digit from 0-9, and the values represent the model’s prediction. For example, the first image is predicted to be a 7.
What if we want to deploy the model in an environment where R isn’t available? The following sections cover exporting and deploying the model with the tfdeploy package.
After training, the next step is to export the model as a TensorFlow
SavedModel using the export_savedmodel()
function:
This will create a “savedmodel” directory that contains a saved
version of your MNIST model. You can view the graph of your model using
TensorBoard with the view_savedmodel()
function:
To test the exported model locally, use the
serve_savedmodel()
function.
The REST API for the model is served under localhost with port 8989.
Because we specified the browse = TRUE
parameter, a webpage
that describes the REST interface to the model is also displayed. The
REST interface is based on the CloudML
predict request API.
The model can be used for prediction by making HTTP POST requests. The body of the request should contain instances of data to generate predictions for. The HTTP response will provide the model’s predictions. The data in the request body should be pre-processed and formatted in the same way as the original training data (e.g. feature scaling and normalization, pixel transformations for images, etc.).
For MNIST, the request body could be a JSON file containing one or more pre-processed images:
new_image.json
{
"instances": [
{
"image_input": [0.12,0,0.79,...,0,0]
}
]
}
The HTTP POST request would be:
curl -X POST -H "Content-Type: application/json" -d @new_image.json http://localhost:8089/serving_default/predict
Similar to R’s predict function, the response includes an array
representing the digits 0-9. The image in new_image.json
is
predicted to be a 7 (since that’s the column which has a 1
,
whereas the other columns have values approximating zero).
{
"predictions": [
{
"prediction": [
1.3306e-24,
4.9968e-26,
1.8917e-23,
1.7047e-21,
0,
8.963e-33,
0,
1,
2.3306e-32,
2.0314e-22
]
}
]
}
Once you are satisifed with local testing, the next step is to deploy the model so others can use it. There are a number of available options for this including TensorFlow Serving, CloudML, and RStudio Connect. For example, to deploy the saved model to CloudML we could use the cloudml package:
The same HTTP POST request we used to test the model locally can be used to generate predictions on CloudML, provided the proper access to the CloudML API.
Now that we’ve deployed a simple end-to-end example, we’ll describe the process of Model Export and Model Deployment in more detail.
TensorFlow SavedModel defines a language-neutral format to save machine-learned models that is recoverable and hermetic. It enables higher-level systems and tools to produce, consume and transform TensorFlow models.
The export_savedmodel()
function creates a SavedModel
from a model trained using the keras, tfestimators, or tensorflow R
packages. There are subtle differences in how this works in practice
depending on the package you are using.
The Keras Example above includes complete example code for creating and using SavedModel instances from Keras so we won’t repeat all of those details here.
To export a TensorFlow SavedModel from a Keras model, simply call the
export_savedmodel()
function on any Keras model:
Keras learning phase set to 0 for export (restart R session before doing additional training)
Note the message that is printed: exporting a Keras model requires
setting the Keras “learning phase” to 0. In practice, this means that
after calling export_savedmodel
you can not
continue to train models in the same R session.
It is important to assign reasonable names to the the first and last layers. For example, in the model code above we named the first layer “image” and the last layer “prediction”.
model %>%
layer_dense(units = 256, activation = 'relu', input_shape = c(784),
name = "image") %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dense(units = 10, activation = 'softmax',
name = "prediction")
The layer names are reflected in the structure of REST requests and responses to and from the deployed model.
Exporting a TensorFlow SavedModel from a TF Estimators model works
exactly the same way as exporting a keras model, simply call
export_savedmodel()
on the estimator. Here is a complete
example:
library(tfestimators)
mtcars_input_fn <- function(data, num_epochs = 1) {
input_fn(data,
features = c("disp", "cyl"),
response = "mpg",
batch_size = 32,
num_epochs = num_epochs)
}
cols <- feature_columns(column_numeric("disp"), column_numeric("cyl"))
model <- linear_regressor(feature_columns = cols)
indices <- sample(1:nrow(mtcars), size = 0.80 * nrow(mtcars))
train <- mtcars[indices, ]
test <- mtcars[-indices, ]
model %>% train(mtcars_input_fn(train, num_epochs = 10))
export_savedmodel(model, "savedmodel")
Generating predictions is done in the same way as with exported Keras
models. First, use serve_savedmodel()
to host the model
locally. Once running, an HTTP POST request can be made:
curl -X POST "http://127.0.0.1:8089/predict/predict/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d "{ \"instances\": [ { \"disp\": [ 160 ], \"cyl\": [ 4 ] } ]}"
Each instance of new data should be formatted as a json array, and each element in the array should be a named array corresponding to the feature columns. This structure is similar to a named list in R.
The response is the predicted MPG:
{
"predictions": [
{
"predictions": [
8.4974
]
}
]
}
The tensorflow package
provides a lower-level interface to the TensorFlow API. You can also use
the export_savedmodel()
function to export models created
with this API, however you need to provide some additional parmaeters
indicating which tensors represent the inputs and outputs for your
model.
For example, here’s an MNIST model using the core TensorFlow API
along with the requisite call to export_savedmodel()
:
library(tensorflow)
sess <- tf$Session()
datasets <- tf$contrib$learn$datasets
mnist <- datasets$mnist$read_data_sets("MNIST-data", one_hot = TRUE)
# Note that we define x as the input tensor
# and y as the output tensor that will contain
# the scores. These are referenced in export_savedmodel
x <- tf$placeholder(tf$float32, shape(NULL, 784L))
W <- tf$Variable(tf$zeros(shape(784L, 10L)))
b <- tf$Variable(tf$zeros(shape(10L)))
y <- tf$nn$softmax(tf$matmul(x, W) + b)
y_ <- tf$placeholder(tf$float32, shape(NULL, 10L))
cross_entropy <- tf$reduce_mean(
-tf$reduce_sum(y_ * tf$log(y), reduction_indices=1L)
)
optimizer <- tf$train$GradientDescentOptimizer(0.5)
train_step <- optimizer$minimize(cross_entropy)
init <- tf$global_variables_initializer()
sess$run(init)
for (i in 1:1000) {
batches <- mnist$train$next_batch(100L)
batch_xs <- batches[[1]]
batch_ys <- batches[[2]]
sess$run(train_step,
feed_dict = dict(x = batch_xs, y_ = batch_ys))
}
export_savedmodel(
sess,
"savedmodel",
inputs = list(image_input = x),
outputs = list(scores = y))
Once the model is exported, the same process of using
serve_savedmodel
can be used, and the same HTTP requests
demonstrated in the Keras example can be used against the tensorflow
model.
There are a variety of ways to deploy a TensorFlow SavedModel, each
of which are described below. Of the 3 methods described, 2 of them
(CloudML and RStudio Connect) share the same REST interface that we have
been using with serve_savedmodel
to test locally. The REST
interface is described in detail here: https://cloud.google.com/ml-engine/docs/v1/predict-request.
You can deploy TensorFlow SavedModels to Google’s CloudML service using functions from the cloudml package. For example:
Once deployed to CloudML, predictions can be made using the same REST interace we previously used locally. The HTTP POST requests will be similar to the sample requests, but CloudML additiobnally requires proper authorization.
See the Deploying Models article on the CloudML package website for additional details.
RStudio Connect is a publishing platform for applications, reports, and APIs created with R. Connect runs on-premise or in your own cloud infrastructure, giving you full control of the deployment environment. Connect can also integrate with your own security services and user management tools.
An upcoming version of RStudio Connect will include support for hosting TensorFlow SavedModels, using the same REST interface as is supported by the local server and CloudML.
Exported models will be published to Connect using the
rsconnect
package, for example:
library(rsconnect)
deployTFModel('savedmodel', account = <username>, server = <internal_connect_server>)
If you would like to preview the feature, or get more information, contact [email protected].
TensorFlow Serving is an open-source library and server implementation that allows you to serve TensorFlow SavedModels using a gRPC interface as opposed to the REST interface offered by the previous deployment tools.
Once you have exported a TensorFlow model using
export_savedmodel()
it’s straightforward to deploy it using
TensorFlow Serving. See the documentation at https://www.tensorflow.org/serving for additional
details.