Saving Machine Learning Models
After you have found, trained, and fine tuned your model, it's time to save it into a file so you can load it at any time to make predictions later.
In this post we will use pickle to serialize and save our models.
Pickle
Pickle allows you to serialize your ML models and save them into a file. Then you can load and deserialize them so you could use them to make predictions.
Let's quickly build a linear regression model for housing prices so that we can try saving and loading it.
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
# Load dataset
data = pd.read_csv('../data/housing.csv')
# Split data into features and labels
features = data.drop(['median_house_value'], axis=1)
# Get rid of incomplete and non-numerical features so we don't have to deal
# with data preparation in this post
features = features.drop(['ocean_proximity', 'total_bedrooms'], axis=1)
# Only the column we want to predict
labels = data['median_house_value']
# Create an object - a specific model we can actually train
model = LinearRegression()
# Train the model
model.fit(features,labels)
Now that we have our model, saving it is very simple:
from pickle import dump
dump(model, open('housing_model', 'wb' ))
When you need the model later, loading it is just as easy:
from pickle import load
saved_model = load(open('housing_model', 'rb' ))
predictions = saved_model.predict(features)
print('A house with these parameters:\n', features.loc[0])
print('Will cost this much:\n', predictions[0])
print('(actual cost):\n', labels[0])
A house with these parameters:
longitude -122.2300
latitude 37.8800
housing_median_age 41.0000
total_rooms 880.0000
population 322.0000
households 126.0000
median_income 8.3252
Name: 0, dtype: float64
Will cost this much:
403422.26733257994
(actual cost):
452600.0
That's it! Don't forget to document the version of python and pickle to make sure you will try loading it with a compatible one. You may also want to output and save the parameters your model have learned, in case you'll want to making predictions using your own implementation.