Download the notebook here
!
Interactive online version:
Optimal Replacement of GMC Bus Engines: An empirical model of Harold Zurcher¶
This notebook replicates the descriptives in Tabla IIa and IIb from > Rust, J. (1987). Optimal Replacement of GMC Bus Engines: An empirical model of Harold Zurcher. Econometrica, Vol. 55, No.5, 999-1033.
The data is taken from the NFXP software provided by Rust which is available to download here.
Preparations¶
Before executing this file the raw data needs to be processed.
[7]:
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.signal import savgol_filter
sys.path.insert(0, "../data/")
from data_processing import data_reading # noqa: E402
from data_processing import get_data_storage # noqa: E402
[4]:
data_reading()
Odometer at Engine Replacement¶
Table IIa in Rust’s paper describes the milage on which a engine replacement occured. As there are buses, which had a second replacement during the time of the observation, the record of the second replacement will be reduced by the milage of the first, to get the real life time milage of an engine.
[5]:
data_path = get_data_storage()
dict_df = dict()
for filename in os.listdir(data_path + "/pkl/group_data/"):
if filename.endswith(".pkl"):
dict_df[filename[0:7]] = pd.read_pickle(
data_path + "/pkl/group_data/" + filename
)
[6]:
df = pd.DataFrame()
for j, i in enumerate(sorted(dict_df.keys())):
df2 = dict_df[i][["Odo_1"]][dict_df[i]["Odo_1"] > 0]
df2 = df2.rename(columns={"Odo_1": i})
df3 = dict_df[i][["Odo_2"]].sub(dict_df[i]["Odo_1"], axis=0)[
dict_df[i]["Odo_2"] > 0
]
df3 = df3.rename(columns={"Odo_2": i})
df3 = df3.set_index(df3.index.astype(str) + "_2")
df4 = pd.concat([df2, df3])
if j == 0:
df = df4.describe()
else:
df = pd.concat([df, df4.describe()], axis=1)
df = df.transpose()
df.fillna(0).astype(int)
[6]:
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
group_1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
group_2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
group_3 | 27 | 199733 | 37459 | 124800 | 174550 | 204800 | 230650 | 273400 |
group_4 | 33 | 257336 | 65477 | 121300 | 215000 | 264100 | 292400 | 387300 |
group_5 | 11 | 245290 | 60257 | 118000 | 229600 | 250600 | 282250 | 322500 |
group_6 | 7 | 150785 | 61006 | 82400 | 107450 | 125500 | 197750 | 237200 |
group_7 | 27 | 208962 | 48980 | 121000 | 178650 | 207700 | 237200 | 331800 |
group_8 | 19 | 186700 | 43956 | 132000 | 162900 | 182100 | 188400 | 297500 |
[7]:
i = "group_4"
df_group_4_rep_1 = dict_df[i][["Odo_1"]][dict_df[i]["Odo_1"] > 0]
df_group_4_rep_1 = df_group_4_rep_1.rename(columns={"Odo_1": i})
df_group_4_rep_2 = dict_df[i][["Odo_2"]].sub(dict_df[i]["Odo_1"], axis=0)[
dict_df[i]["Odo_2"] > 0
]
df_group_4_rep_2 = df_group_4_rep_2.rename(columns={"Odo_2": i})
df_group_4_rep_2 = df_group_4_rep_2.set_index(df3.index.astype(str) + "_2")
df_group_4 = pd.concat([df_group_4_rep_1, df_group_4_rep_2])
[8]:
bins = 10
hist_data = np.histogram(df_group_4["group_4"].to_numpy(), bins=bins)
hist_data
[8]:
(array([2, 3, 1, 4, 3, 8, 6, 3, 1, 2]),
array([121300., 147900., 174500., 201100., 227700., 254300., 280900.,
307500., 334100., 360700., 387300.]))
[9]:
hist_normed = hist_data[0] / sum(hist_data[0])
x = np.linspace(np.min(hist_data[1]), np.max(hist_data[1]), bins)
hist_filter = savgol_filter(hist_normed, 9, 3)
fig, ax = plt.subplots(1, 1)
ax.plot(x, hist_filter)
[9]:
[<matplotlib.lines.Line2D at 0x7f28ad2eb310>]
Never failing buses¶
Table IIb explores buses, which never had an engine replacement. Therefore this data is left-censored, as the econometrican never observes the time of replacement. The table shows the variation in the odometer record at the end of the observation period.
[10]:
df = pd.DataFrame()
for i in sorted(dict_df.keys()):
df2 = dict_df[i][[dict_df[i].columns.values[-1]]][dict_df[i]["Odo_1"] == 0]
df2 = df2.rename(columns={df2.columns.values[0]: i})
if j == 0:
df = df2.describe()
else:
df = pd.concat([df, df2.describe()], axis=1)
df = df.transpose()
df = df.drop(df.columns[[4, 5, 6]], axis=1)
df[["max", "min", "mean", "std", "count"]].fillna(0).astype(int)
[10]:
max | min | mean | std | count | |
---|---|---|---|---|---|
group_1 | 120151 | 65643 | 100116 | 12929 | 15 |
group_2 | 161748 | 142009 | 151182 | 8529 | 4 |
group_3 | 280802 | 199626 | 250766 | 21324 | 21 |
group_4 | 352450 | 310910 | 337221 | 17802 | 5 |
group_5 | 326843 | 326843 | 326843 | 0 | 1 |
group_6 | 299040 | 232395 | 265263 | 33331 | 3 |
group_7 | 0 | 0 | 0 | 0 | 0 |
group_8 | 0 | 0 | 0 | 0 | 0 |