Saving Machine Learning Models

After you have found, trained, and fine tuned your model, it's time to save it into a file so you can load it at any time to make predictions later.

In this post we will use pickle to serialize and save our models.

Pickle

Pickle allows you to serialize your ML models and save them into a file. Then you can load and deserialize them so you could use them to make predictions.

Let's quickly build a linear regression model for housing prices so that we can try saving and loading it.

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

# Load dataset
data = pd.read_csv('../data/housing.csv')

# Split data into features and labels
features = data.drop(['median_house_value'], axis=1)
# Get rid of incomplete and non-numerical features so we don't have to deal
# with data preparation in this post
features = features.drop(['ocean_proximity', 'total_bedrooms'], axis=1)

# Only the column we want to predict
labels = data['median_house_value']

# Create an object - a specific model we can actually train
model = LinearRegression()

# Train the model
model.fit(features,labels)

Now that we have our model, saving it is very simple:

from pickle import dump

dump(model, open('housing_model', 'wb' ))

When you need the model later, loading it is just as easy:

from pickle import load

saved_model = load(open('housing_model', 'rb' ))

predictions = saved_model.predict(features)
print('A house with these parameters:\n', features.loc[0])
print('Will cost this much:\n', predictions[0])
print('(actual cost):\n', labels[0])
A house with these parameters:
 longitude            -122.2300
latitude               37.8800
housing_median_age     41.0000
total_rooms           880.0000
population            322.0000
households            126.0000
median_income           8.3252
Name: 0, dtype: float64
Will cost this much:
 403422.26733257994
(actual cost):
 452600.0

That's it! Don't forget to document the version of python and pickle to make sure you will try loading it with a compatible one. You may also want to output and save the parameters your model have learned, in case you'll want to making predictions using your own implementation.

Receive weekly digest of my best posts!