IMPROVING THE ACCURACY OF THE COST ESTIMATION OF PUBLIC TRANSPORTATION PROJECT: A DATA DRIVEN SELECTION OF THE ESTIMATING METHODOLOGY
Abstract
The preliminary engineer’s estimate for public highway projects has long been a deciding
factor on whether State Transportation Agencies (STAs) can proceed with projects that are
essential for the public’s well-being. Most transportation and infrastructure projects are funded
from a limited reservoir provided by federal, state, and local government programs, and the
preliminary engineer’s estimate acts as a benchmark for the spending of funds in said reservoir.
Therefore, it is of paramount importance that the drafting of the preliminary engineer’s estimate
considers the market conditions and is reflective of the contractor bids, for the proper allocation
of funds to projects governed by STAs. Cost estimating in transportation and infrastructure
projects is a dynamic process that transforms along the major phases of highway construction
projects. The phases are broken out between planning, project development, final design, rightof-
way acquisition, construction, and operation and maintenance. This research primarily
explored the engineer’s estimate prepared during the final design phase in highway construction
projects, which is referenced in this paper as the “engineer’s estimate”. The Federal Highway
Administration (FHWA) measures the effectiveness and accuracy of the engineer’s estimate in
terms of the percentage deviation of the low bid from the engineer’s estimate and recommends
an accuracy defined by at least 50 percent of low bids falling between ±10 percent of the that
estimate. Despite commendable efforts from STAs, high deviations of estimates from low bids
remain a persistent problem that public agencies face. The Wisconsin Department of
Transportation (WisDOT) requested the support of the Construction and Materials Support
Center (CMSC) at the University of Wisconsin - Madison in running an estimating peer
exchange with fellow STAs to determine underlying causes behind the high deviation of the final
design engineer’s estimate from low bids. One important influencing factor on the effectiveness
and accuracy of the estimate identified, is the method of cost estimation, which includes
historical bid-based estimating, cost-based estimating, and combination estimating. The highway
and infrastructure industry has no precise analysis or conclusion on the impact of the different
methods of cost estimation on the estimates developed during the initial stages of a project and
lack a universally accepted methodology for the choice of the method of cost estimation. Thus,
there is a need for STAs to evaluate the effect of using the different methods of cost estimation
on the estimate accuracy, as defined by the FHWA, to identify the most suitable approach for all
project types.
This research utilizes expert opinion from the estimating peer exchange and data-driven
based algorithms for STAs to predict the better suited method of cost-estimation for the estimates
created during the early stages of the project stages to better allocated funds from public
agencies. Data was collected from eleven participating STAs, during the estimating peer
exchange, as well as five other STAs using a survey. The data collected is related to the best
scoping, cost estimation, and risk assessment practices during early stages of the project, as well
as performance of the states’ engineer’s estimate accuracy from the year 2018 to 2020.
Additionally, both qualitative and quantitative analysis was performed to evaluate the variation
in estimate precision and accuracy using the method of cost estimation using the average yearly
data from the STAs. Among the number of bidders, geographic location, shortage of estimators,
and economic volatility, the method of cost estimation was identified as a majorly impactful
factor on the engineer’s estimate accuracy, as defined by the FHWA. Historical bid-based
estimating was found to be the most common method, followed by combination estimating, and
finally cost-based estimating. The methods averaged at 47%, 48%, and 53% respectively, of the
low bids falling within ±10 of the engineer’s estimate. While cost-based estimating results in the
highest accuracy, it requires extensive training of the estimating personnel.
The yearly average dataset was insufficient in concluding which method of cost estimation
is better suited for the highway and infrastructure sector. Consequently, prediction machine
learning algorithms were employed to predict the optimum method of cost estimation depending
on project related variables and economic variables. Raw data was collected from six STAs,
Montana Department of Transportation (MDT), Nebraska Department of Transportation
(NDOT), North Dakota Department of Transportation (NDDOT), Tennessee Department of
Transportation (TDOT), Washington State Department of Transportation (WSDOT), and
Wisconsin Department of Transportation (WisDOT). The data obtained only included
observations for projects estimated using historical bid-based estimating, and combination
estimating with 5-10% line items estimated using cost-based estimating. The dataset spanned 11
unified project types, and was trained using the following machine learning algorithms, multiple
linear regression (ML), logistic regression (LOGIT), classification and regression trees (CART),
and random forests (RF) to predict the most suitable method of cost estimation. The gathered
data were separated into two groups: one for training the model and the other for testing
purposes. Using the same dataset, the models were developed, and then their performances were
evaluated based on the area under the receiver operating curve (AUC).
ML was used as the standard statistical analysis to evaluate the need for more complex
machine learning models. It was unable to capture non-linear relationships, which proved to be a
governing factor behind its low model performance. Economic variables were found to be the
most influential on the optimal method of cost estimation, primarily the prime loan rate with a
feature coefficient of -11.8611. The project types loosely followed behind with feature
coefficients ranging between 0.1787 and 0.6571.
LOGIT was found to be substantially better than the ML method in many respects, including
the flexibility around linear relationships, and a obtained a significantly higher performance.
Three models were developed using LOGIT, a base model, l1-regularization model, and l2-
regularization model. All three models obtained a classification accuracy of 89%, but the l2-
regularization model reduced the feature correlation and bias in the model, so it was deemed
more fitting for predicting the method of cost estimation with a low risk of overfitting. The prime
loan rate was again found to be of highest importance with a coefficient on -84.9338, followed
by the project types ranging between 1.102 to 4.907.
One CART model was then developed due to its flexible and non-parametric modeling
properties, meaning that there are no strict assumptions. It was able to better capture
nonlinearities between the features and the target variables than both the ML and LOGIT
models. Using hyperparameter tuning, a maximum model accuracy of 0.99 was obtained using a
maximum depth of 9, minimum samples per leaf of 4, and minimum samples per split of 8.
CART models are notoriously susceptible to overfitting, and even with the hyperparameter
tuning the CART model was deemed not optimal for the prediction of the method of estimation.
The projects under the maintenance or minor upgrades type were ranked at the top of the list
with a coefficient of 2.0997 followed by the crude oil prices at a coefficient of 0.7586.
Similar to CART, RF was not sensitive to linear relationships between the features and the
target variable. Multiple CART trees are combined, with an additional incorporated
hyperparameter related to the number of CART trees to tackle the overfitting of the singular
CART trees. The hyperparameter tuning resulted in a maximum depth of 2, minimum samples
per leaf of 8, minimum samples per split of 9, and an optimum number of trees of 219. The
model obtained a classification accuracy of 90%, which was the highest accuracy across all ML
algorithms.
The RF model was deemed the most suitable for the purpose of predicting the most optimal
method of cost estimation. The data-driven model can be used by STAs to allocate teams of
estimating professionals with varying degrees of experience in estimating. Estimators with a
higher understanding of project cost, can be assigned to projects that require the use of costbased
estimating, such that the burden of training estimating personnel on cost-based estimating
as a method of estimation is lightened. Hence, the funds available to STAs can be more
optimally allocated for the benefit of the public and the economy.
Moreover, economic related factors in all the models consistently exceeded the influence of
project related factors. Primarily in the form of the prime loan rate, and the crude oil prices. The
project type was the leading influence among the project related features with safety and traffic
control, maintenance and minor upgrades, environmental mitigation, roadway redesign, road or
culvert replacement, earthwork, and resurfacing project types favoring the use of combination
estimating while bride construction and bridge replacement projects consistently elected
historical bid-based estimating as the preferred method of estimation.
Subject
engineer's estimate
DOT
machine learning
data
transportation
highway
Permanent Link
http://digital.library.wisc.edu/1793/83525Type
Thesis

