1. Introduction
Real-time traffic state estimates have been increasingly recognized following the introduction of recent advanced technologies such as connected vehicle (CV) technologies. CVs aim to improve road safety by potentially reducing human errors, mitigating traffic congestion levels by offering alternative routes, and reducing on-road emissions and fuel consumption [
1]. Nowadays, conducting research with limited probe vehicle data (e.g., CVs) is a challenge, especially when no additional data sources are provided. Hence, past research has utilized probe data in conjunction with existing detection systems to enhance proposed traffic models, despite the limitation that fixed detection techniques (e.g., loop detectors) always have some noise in their data [
2,
3,
4].
A probe vehicle is defined as a vehicle that provides real-time information, such as its instantaneous position and speed. Several benefits of using probe vehicle data have been recognized; for example, the high quality of data compared with existing data sources (e.g., cameras and loop detectors), and data can be collected at any location inside the network, thus offering a clear picture about traffic behavior at any time. Therefore, transportation agencies are putting effort into facilitating the use of probe vehicle data.
Limited studies have used only information from probe vehicle data (e.g., Global Positioning Systems [GPSs]) to estimate the state of on-road traditional vehicles [
5], such as traffic travel time, traffic density, traffic speed, and traffic volume. The real-time estimation of traffic density is important to achieving better traffic operations management in urban areas. This paper aims to estimate the total number of vehicles on signalized link approaches using only probe vehicle data. The estimate outcomes can be provided to traffic signal controllers to optimally determine the allocation of green time for each traffic signal phase [
6,
7], leading to better intersection performance measures such as intersection delays and vehicle crashes [
8,
9]. One concern with using probe vehicles is measuring their level of market penetration (LMP). The LMP is defined as the ratio of the total number of probe vehicles to the total number of vehicles. Providing accurate LMP estimates improves the estimation accuracy of the vehicle counts [
5]. Therefore, in this paper, a machine-learning technique is developed to provide reliable LMP estimates.
2. Related Work
Different statistical tools have been used to estimate the total number of vehicles on arterial roads and freeways, such as the Kalman filter (KF) [
10], Bayesian statistics [
11], and Particle filter [
12] approaches. The literature shows the benefits of using the KF technique in addressing different aspects of the traffic estimation problem. The KF has been used to estimate the traffic travel time [
13,
14], traffic speed [
15,
16], and traffic density [
5,
17]. Different detection techniques have been employed to estimate the number of vehicles, such as loop detectors, camera systems, and probe data. Two loop detectors, one at the entrance and the other at the exit of the link, are utilized to measure the total number of arrivals and departures, then the number of vehicles are simply obtained by applying the flow continuity equation [
18]. A robust KF model with at least three loop detectors on the tested link was employed to estimate the number of vehicles on the link in [
17]. The study derived the KF state equation from the flow continuity equation, while the measurement equation was derived from the relationship of the detector time occupancy and space occupancy; however, the cost of implementing such an algorithm in the field is high given the number of sensors needed. Another study employed the KF to estimate the number of vehicles on multi-section freeways. The state equation was derived from the flow continuity equation, while the measurement equation was derived from the hydrodynamic relationship between traffic speed and density [
19]. Loop detectors were used in addition to speed sensors in the middle of the tested section. However, the proposed algorithm is hard to employ in the field due to the high cost of implementation. A video record, another detection technique, was used to estimate the traffic density for signalized links [
20]. In that study, the authors used the space-mean speed rather than the traffic flow in the state equation due to high errors accompanied with sensor failures. Their argument takes into account that the space-mean speed is taken as an average quantity while the traffic flow is a cumulative quantity. They also demonstrated the importance of having knowledge about the system noise characteristics to improve the performance of the KF model. Consequently, the authors of this paper applied an adaptive Kalman filter (AKF) to enable real-time estimates of statistical parameters of the system noise rather than using predefined values for the entire simulation (as assumed in the traditional KF model).
As illustrated in the literature, stationary sensors, such as loop detectors and camera systems, suffer from poor detection accuracy and have high installation and maintenance costs. Advanced detection techniques such as GPS data have proven to be more accurate without the need to install additional hardware. Consequently, recent studies have developed several traffic estimation models using fusion data (combination of two different data sources) to estimate the number of vehicles with the aim of achieving better accuracy than using only one source of data. In many of the works using fusion data, the KF technique was employed for estimating traffic density. One study achieved accurate estimated traffic density results using the traffic flow values measured from a video detection system and the travel time obtained from vehicles equipped with GPS devices [
2]. The proposed estimation approach in this study differs in two significant ways from the proposed AKF model, namely only probe vehicle data are used with a variable time interval rather than a fixed value (the updating time interval was 1 min in [
2]), and the proposed estimation approach uses the AKF to allow for real-time estimates of statistical parameters of the state and measurement noise.
Reviewing the literature, the KF model has proven its ability to address estimation research problems for different traffic applications. However, it is hard to implement in real-world applications due to hard estimates of statistical characteristics of the system noise (mean and variance). Consequently, researchers have developed the AKF to solve this issue and make field implementation possible. Chu et al. proposed an AKF model to estimate freeway travel time using both loop detectors and probe data [
21]. They presented the estimation method for noise statistic parameters that was proposed in [
22]. This estimation method of statistical parameters is known for its simplicity in handling errors and its fast processing time. Hence, in this study, the estimation of the statistical parameters uses the same estimation procedure as in Chu et al.’s study. It should be noted that the main difference between the proposed estimation approach and Chu et al.’s approach is that our model uses only probe vehicle data.
In a recent study, the KF model was proposed to estimate the number of vehicles on signalized link approaches using only probe vehicle data [
5]. The KF state equation was based on the traffic flow continuity equation and thus one value of probe vehicle LMP (
), for the entire link, is used to scale up the probe measurements to reflect the total flow in the second term of the flow continuity equation as presented in Equation (
1). It was found that using two LMP values (at the entrance and the exit of the link) produce more accurate vehicle count estimates, especially when dealing with low LMPs, as described later in
Section 4.3. In Equation (
1),
is the number of vehicles traversing the link at time (
t),
is the variable duration of the updating time interval,
is the number of vehicles traversing the link in the previous interval,
and
are the probe flows entering and exiting the link between (
) and (
t), respectively, and
is the LMP of probe vehicles.
Machine learning has proven its ability to provide accurate estimates for different traffic characteristics [
23,
24,
25,
26,
27,
28]. Traffic speed and density have been estimated using an artificial neural network (ANN) model [
23]. Video and Bluetooth data were used to build the ANN model. The traffic flow data were manually extracted from the video records, while the speed data were constructed from the collected Bluetooth travel time data. The neural network model (NN) is able to address the research problem if a good quantity of training data is accessible. Another study conducted several machine learning techniques such as k-means clustering, k-nearest neighbor classification, and locally weighted regression to estimate traffic speed [
24] using archived data of speeds, counts, and densities. They found that machine learning models can improve the accuracy of speed estimation. Khan et al. [
25] used artificial intelligence to classify the level of service in a freeway segment based on traffic density values. They used loop detectors and CV data to develop support vector machine and k-nearest neighbor classification. Results indicated higher accuracy from the support vector machine algorithm than the k-nearest neighbor classification algorithm. Estimating hourly traffic volumes between sensors was addressed using an NN model in the Maryland highway network [
27], deploying both probe vehicles and automatic traffic recording station data to construct the NN model. A comparison was also made between linear regression, k-nearest neighbor, support vector machine with linear kernel, random forest, and NN models, concluding that the NN model performed the best. The proposed approach produced 24% more accurate estimates than current volume profiles.
In this research study, an AKF technique was applied to estimate real-time vehicle counts along signalized link approaches using only probe vehicle data. The study then considers the recommendation of Aljamal et al’s study [
5] by using two LMP values at the entrance and the exit of the tested link. To achieve this task, an NN model was developed to provide real-time estimates of the LMP values to improve the accuracy of the proposed AKF model. After that, the paper develops the new AKFNN approach after combining the AKF with the developed NN models. The proposed study extends the state-of-the-art in vehicle count estimates by making four major contributions:
The study tests the proposed AKF model using only probe vehicle data. The approach was evaluated considering different probe vehicle LMPs ranging from 10% to 90% at increments of 10%.
The study develops an NN model to estimate the LMP of probe vehicles at the exit of the link to reflect the total vehicle departures.
The study tests the developed AKFNN approach by using a fusion of probe and single-loop detector data. A comparison between the traditional KF, AKF, and AKFNN models is presented.
The study examines the impact of the initial conditions on the AKF estimation model. Three initial condition parameters are tested: the initial vehicle count estimate, the initial mean estimate of the state noise errors, and the a priori initial covariance of the state system.
This paper is organized as follows. The first section describes the development of the simulation data. The second section describes the estimation models and the problem formulation for the KF, AKF, and AKFNN models. The third section discusses the results of the new proposed models. The fourth section provides the conclusions of the study and recommended future work.
3. Development of Simulation Data
This paper relies on the INTEGRATION traffic simulation model [
29] to validate and test the accuracy of the proposed models. The INTEGRATION software has been extensively validated and demonstrated to replicate empirical observations [
30,
31,
32,
33,
34,
35]. Specifically, INTEGRATION was used to create synthetic data for conditions not observed in the field to quantify the sensitivity of the proposed method to the link length and traffic demand level. The selected tested link is located in downtown Blacksburg, Virginia, with an approximate length of 102 m based on ArcGis software, and connects two signalized intersections. The link characteristics were calibrated to local conditions using typical values, which included a free-flow speed of 40 (km/h), a speed-at-capacity of 32 (km/h), a jam density of 160 (veh/km/lane), and a base saturation flow rate of 2100 (veh/h/lane), which resulted in a roadway capacity of 700 (veh/h) given the cycle length and green times of the traffic signal. The traffic signal cycle length is 75 s and it has four phases with the following displayed green times: 5, 25, 5, and 28 s. The tested link here is assigned with a displayed green time of 25 s. These values were consistent with what was coded in the field.
The INTEGRATION simulation model was used to ease the generation of probe vehicle data as real probe data are not easy to access. For each LMP, a total of 50 scenarios were generated with different random seeds as conducted in [
25]. Forty-nine scenarios were used to train and validate the proposed NN model, and scenario number 50 was considered the testing data set. The INTEGRATION model generates a “time-space” file which provides some information about the probe vehicles during their trips for every second. The time-space file records the instantaneous position, speed, and spacing for each probe vehicle. In addition to that, a loop detector is installed at the entrance of the tested link to create a detector output file which provides some data about the simulation behavior such as speed, traffic volume, and occupancy at the detection location.
5. Results
This section evaluates the performance of the proposed models. The first subsection evaluates the performance of the AKF model and then compares the AKF with the KF model (
Section 5.1). The second subsection presents the performance of the NN model used for estimating the LMP of probe vehicles at the exit of the link (
) (
Section 5.2). The third subsection compares the performance of AKF with the AKFNN approach (
Section 5.3). The fourth subsection investigates the sensitivity of the AKF estimation model to the initial conditions (
Section 5.4). The accuracy of the proposed models was evaluated based on the root mean square error (RMSE) as shown in Equation (
24). The RMSE has been frequently used in the literature to measure the difference between the model estimates and the actual values.
where
represents the estimated vehicle count values,
represents the actual vehicle count values, and
n is the total number of estimations. All simulation scenarios start with the following initial conditions: an initial vehicle count estimate of zero (
), which is the same value of the actual vehicle count, and initial mean and the prior covariance estimates of the state system (
= 2 veh and
= 75 veh
2) if the LMP scenario is less than or equal 60%, and (
= 9 veh
= 120 veh
2) if the LMP scenario is greater than 60%. The proposed models were evaluated using different probe vehicle LMPs, including 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90%. For each scenario, a Monte Carlo simulation was conducted to create 300 random samples of probe vehicles from the full data set.
5.1. Comparison of the KF and the AKF Models
This section evaluates the proposed AKF model with real-time estimates of the error statistical parameters for the state and the measurement. This section also compares the proposed AKF model with the developed KF model in [
5], as shown in
Table 2. Results show that the AKF outperforms the KF model in most scenarios except for the scenarios with high LMPs (i.e., LMP of 80% and 90%). Results demonstrate the need to provide real-time estimates for the mean and variance error values in the state and measurement when dealing with low/medium LMPs. This happened due to high error in the fixed
value that was used, which then produced high error in the vehicle count estimate. The AKF improved the traditional KF vehicle-count estimation accuracy by up to 29%. In contrast, for high LMPs, the user may proceed with predefined statistical values for the state and measurement (mean and variance error values), due to low errors in the vehicle count estimates (low error in the
value). In conclusion, a simple KF can be used with high LMPs without the need to change statistical noise parameters at every estimation step.
5.2. Developed NN Model
The NN model was employed to predict the (
) value, which is used to reflect the total number of vehicle departures from the given number of probe vehicle departures. The data set was divided into 70% for training, 15% for validation, and 15% for testing. The validation data set is used to measure network generalization and to avoid any over fitting problems [
38]. The developed NN performance is shown in
Table 3. The mean square error (MSE) is
and the R value is close to
. The R value measures the correlation between model outputs and desired outputs. A value close to
means that the model outputs are very close to desired outputs.
Figure 3 shows the error histogram for the training, validation, and testing data and their deviations from the zero error bar. Most of the errors lie around the zero error bar, which means that the developed NN model appropriately addressed the research goal (i.e., estimating
).
Figure 4 presents the predicted and actual values for the
at different LMPs.
5.3. Comparison of the AKF and the AKFNN Models
This section demonstrates the impact of using two values rather than using one predefined value. The average predefined value is defined as the value for the entire tested link. The average value remains constant for the entire simulation for each LMP scenario. For instance, if the scenario of 10% LMP is tested, the value in both the state and measurement is treated as a value of . In this study, the authors proposed the use of two values; one at the entrance and one at the exit of the link to reflect the total number of arrivals and departures from the given total number of probe arrivals and departures, respectively.
is measured directly using the installed loop detector at the entrance of the link. The developed NN model is used to predict the
values (
Section 5.2). Then, the
and
values are utilized in the AKF equations. Recall that the AKF model relies only on probe vehicle data, while the AKFNN model uses a fusion of probe vehicle and single-loop detector data.
In
Table 4, the RMSE values using the AKF and the AKFNN models are presented. The results demonstrate the benefits of using the AKFNN approach rather than the AKF approach, where the estimation accuracy is improved by up to 26%. This finding proves what was recommended by Aljamal et al.’s previous study [
5] to consider two
values rather than one value. As a result, the proposed AKFNN approach is robust and produces reasonable errors even with low LMPs. For instance, the estimated vehicle count values are off by
veh when the LMP is equal to 10%.
Figure 5 presents the vehicle count estimation for different LMPs using the proposed AKFNN Approach.
5.4. Impact of the Initial Conditions on the AKF Model
The KF model, traditional and adaptive, is sensitive to the initial condition parameters, such as the posterior state estimate ( = ), the mean of state noise ( = ), and the prior error covariance estimate ( = ). These parameters are tuned by a trial-and-error technique to find the best initial condition values for seeking better KF estimation outcomes. However, in real applications, trial-and-error is not realistic and not easy to achieve. Hence, this section investigates the impact of initial conditions on the accuracy of the vehicle count estimation.
5.4.1. Impact of Initial Estimate of the Vehicle Count ()
For the initial estimate value of the vehicle count (
), different values were evaluated (ranges from 0 to 10 at increments of 1). In this study, remember that all simulation scenarios start with an initial estimate of zero (
), which is the same value as the actual vehicle count.
Figure 6a presents the RMSE values for different
values for the scenario of 10% LMP. As shown in the figure, the values of 8 and 10 produce the lowest RMSE. The RMSE value is equal to
veh when
is equal to 0. In contrast, theRMSE value is equal to
veh when
is equal to 8. As a result, starting the AKF model with the best initial estimate (e.g.,
= 8 veh) would reduce the errors and therefore improve the estimation accuracy.
5.4.2. Impact of Initial Mean Estimate of the State System ()
Another critical initial parameter in the AKF model is
. This parameter represents the mean value of the noise in the state equation. This paper tests 16 different
values (i.e., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15).
Figure 6b presents the vehicle count estimation RMSE values for different
values. The RMSE value is equal to
veh when the simulation starts with a 0 value of
. In contrast, the RMSE value is
veh when the value of
is equal to 11.
5.4.3. Impact of Initial Prior Covariance Estimate of the State System ()
The last parameter tested in this study is the initial prior estimate of error covariance
. The error covariance parameter describes the accuracy of the state system. For instance, if the covariance value is low, then the state outcome is accurate and close to the actual value. As stated in the literature, the initial parameters should always be tuned to achieve accurate estimation accuracy. Thirteen different
values were tested (i.e., 5, 10, 15, 20, 25, 50, 75, 100, 120, 150, 200, and 250).
Figure 6c presents the RMSE values using different
values. The
value of 150 veh
2 produces the lowest RMSE values.
The research presented in this study evaluates the proposed approaches as they should be in real-world applications. Therefore, the trial-and-error technique was avoided since it is not a valid solution in the field. However, it was noticed that previous research always tunes the initial parameters to determine the best initial conditions when testing their estimation approaches [
2,
3,
17]. If that is the case, let us assume that the proposed AKFNN approach always starts with the best initial value of
, which would produce less errors.
Table 5 presents the RMSE when considering the trial-and-error technique (Tuned AKFNN). The AKFNN and the Tuned AKFNN approaches used the same values of
and
, but they used different
values.
is assumed to be zero, while
has two values based on the tested scenario: a value of 2 veh when low LMP scenarios are tested (LMP <= 60%), and a value of 9 veh with high LMP scenarios (LMP > 60%). From the table, tuning the
value significantly improves the estimation accuracy for all scenarios (by up to 27%). For instance, at 10% LMP, the estimation error dropped from 3.7 to 3.3 vehicles. On the other hand, the estimated vehicle count values are off by 2.8 vehicles instead of 3.6 vehicles for the scenario of 20% LMP.
In conclusion, the AKF model was proven to be very sensitive to the initial conditions (
). Hence, starting the simulation with good assumptions of the initial conditions can significantly improve the estimation accuracy, as shown in
Table 5. Finally,
Table 6 presents the performance of the models discussed in the paper.
6. Conclusions
The research proposed a novel AKF model for estimating the number of vehicles on signalized approaches using only probe vehicle data. An AKF model was developed to provide real-time estimates of the statistical properties (mean and variance) for the state and measurement errors. The state equation is derived from the traffic flow continuity equation, while the measurement equation is constructed using the traffic hydrodynamic equation. Results show that the proposed AKF model outperforms the traditional KF model (improves the estimation accuracy by up to 29%), demonstrating the need to use real-time values of the statistical noise parameters in the KF model.
Two estimation models were presented, namely (a) the AKF and (b) the AKFNN. The AKF model uses only probe vehicle data assuming a fixed LMP value that is obtained from historical data, while the AKFNN uses a fusion of probe and single-loop detector data with real-time estimates of the LMP values ( and ). In this paper, a robust NN model was developed to provide accurate real-time estimates of the values. The selected features of the NN model are (observed from the single-loop detector), , , , and (observed from probe vehicles).
The AKF and the NN models were combined to develop the novel AKFNN approach. Results demonstrate that the AKFNN approach significantly improves the vehicle count estimation accuracy since the and values are estimated better. Subsequently, the paper compared the AKF with the AKFNN models, showing that the AKFNN model outperforms the AKF model, enhancing the estimation accuracy by up to 26%.
Finally, the study investigated the impact of the initial conditions (, , and ) on the AKF performance. Results show that the AKF model is very sensitive to the initial conditions. For instance, starting the simulation with an value of 8 instead of 0 improves the estimation accuracy by 10%. In addition, starting the simulation with an value of 11 instead of 2 enhances the estimation accuracy by up to 10%. For the parameter, an improvement of 7% could occur if the simulation starts with an initial value of 150 instead of 75 veh2. The study also tested the accuracy of the AKFNN estimation by allowing the parameter to be tuned (Tuned AKFNN approach), showing that more improvement could be achieved. Specifically, the Tuned AKFNN improves the accuracy by up to 27%.
In conclusion, both models (AKF and AKFNN) produce high estimation accuracy when compared with the state-of-the-art KF model. Proposed future work entails testing traffic signal performance using the estimates of the total number of vehicles as inputs to the traffic signal controller.