A Decision-Making Strategy for Car Following Based on Naturalist Driving Data via Deep Reinforcement Learning
Abstract
:1. Introduction
2. Related Work
3. Analysis of Car-Following Behavior-Based Naturalist Driving Data
3.1. Source of Naturalist Driving Data
3.2. Statistical Feature Analysis
3.3. Correlation Analysis
4. Deep Reinforcement Learning for Autonomous Car-Following Decision-Making
4.1. State Space
4.2. Action Space
4.3. Reward Function
4.4. Termination Conditions
- (i)
- Collision: the FV is not effectively braked, resulting in traffic accidents.
- (ii)
- Early stop: the FV has too conservative collision avoidance, leading to stopping.
- (iii)
- Vehicle stuck: the FV speed is always lower than 0.1 m/s within 10 steps.
- (iv)
- No reward increase: no increase within 100 steps in each episode.
4.5. CF Decision-Making Algorithm
5. Simulation Results and Discussion
5.1. Simulation Setup
5.2. Simulation Results
- (a)
- We use the DDPG algorithm combined with NDD to achieve an autonomous car-following decision-making strategy. The penalty reward in the form of mechanical energy is introduced in the design of the reward function, which is a function of speed and space headway rather than constant reward. Meanwhile, the 3σ boundary of the speed-acceleration fitting curve of the training dataset is used to realize varying constraints for FV action (recorded as our proposal).
- (b)
- In the application of some DRL algorithms, the action output by the agent generally uses fixed empirical constraints. Thus, the fixed empirical constraint (FEC) action range of FV is determined by referring to NDD, i.e., [amin, amax] = [−4 m/s2, 2 m/s2], and the other parameters are the same as our proposal (recorded as FEC).
- (c)
- In some DRL studies, constants are used as punishment rewards for collision or lane departure in agent training. Hence, a constant value for FV collision and early stop is used as the reward function, i.e., Equation (8) is changed to rk(t) = −100, and Equation (9) is changed to rp(t) = −50. Furthermore, FV’s action also uses fixed empirical constraints (recorded as FEC w/CP).
- (d)
- The DRL strategy established uses the same varying constraint as our proposal and constants as the reward function of collision and early stop (recorded as VC w/CP).
- (e)
- A rule-based control strategy is established regarding the characteristics of car-following behavior, combined with safety, efficiency, and comfort as multiobjective constraints. An MPC car-following model based on constant THW is constructed, in which the model parameters are determined according to the distribution of the characteristic parameters of car-following behavior. Similarly, the reward function is designed similarly to our proposal (recorded as MPC-based).
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, L.; Jiang, R.; He, Z.; Chen, X.; Zhou, X. Trajectory data-based traffic flow studies: A revisit. Transp. Res. C Emerg. Technol. 2020, 114, 225–240. [Google Scholar] [CrossRef]
- Higatani, A.; Saleh, W. An Investigation into the Appropriateness of Car-Following Models in Assessing Autonomous Vehicles. Sensor 2021, 21, 7131. [Google Scholar] [CrossRef] [PubMed]
- Liu, T.; Selpi; Fu, R. The Relationship between Different Safety Indicators in Car-Following Situations. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018. [Google Scholar]
- Kim, H.; Min, K.; Sunwoo, M. Driver Characteristics Oriented Autonomous Longitudinal Driving System in Car-Following Situation. Sensor 2020, 21, 6376. [Google Scholar] [CrossRef] [PubMed]
- Kuefler, A.; Morton, J.; Wheeler, T.; Kochenderfer, M. Imitating driver behavior with generative adversarial networks. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017. [Google Scholar]
- Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Sallab, A.A.; Yogamani, S.; Pérez, P. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst. 2021, 23, 4909–4926. [Google Scholar] [CrossRef]
- Lefevre, S.; Carvalho, A.; Borrelli, F. Autonomous Car Following: A Learning-Based Approach. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Seoul, Korea, 28 June–1 July 2015. [Google Scholar]
- Moon, S.; Yi, K. Human driving data-based design of a vehicle adaptive cruise control algorithm. Veh. Syst. Dyn. 2008, 46, 661–690. [Google Scholar] [CrossRef]
- Wang, Q.; Xu, S.Z.; Xu, H.L. A fuzzy Control Based Self-Optimizing PID Model for Autonomous Car Following on Highway. In Proceedings of the 2014 International Conference on Wireless Communication and Sensor Network, Wuhan, China, 13–14 December 2014. [Google Scholar]
- Li, G.Z.; Zhu, W.X. The Car-Following Model Based on Fuzzy Inference Controller. In Proceedings of the IOP Conference Series: Materials Science and Engineering, Beijing, China, 1–3 August 2019. [Google Scholar]
- Goñi-Ros, B.; Schakel, W.J.; Papacharalampous, A.E.; Wang, M.; Knoop, V.L.; Sakata, I.; Arem, B.V.; Hoogendoorn, S.P. Using advanced adaptive cruise control systems to reduce congestion at sags: An evaluation based on microscopic traffic simulation. Transp. Res. C Emerg. Technol. 2019, 102, 411–426. [Google Scholar] [CrossRef]
- Bolduc, A.P.; Guo, L.; Jia, Y. Multimodel approach to personalized autonomous adaptive cruise control. IEEE Trans. Intell. Veh. 2019, 4, 321–330. [Google Scholar] [CrossRef]
- Wang, X.; Jiang, R.; Li, L.; Lin, Y.; Zheng, X.; Wang, F. Capturing car-following behaviors by deep learning. IEEE Trans. Intell. Transp. Syst. 2018, 19, 910–920. [Google Scholar] [CrossRef]
- Wei, S.; Zou, Y.; Zhang, T.; Zhang, X.; Wang, W. Design and experimental validation of a cooperative adaptive cruise control system based on supervised reinforcement learning. Appl. Sci. 2018, 8, 1014. [Google Scholar] [CrossRef] [Green Version]
- Wang, X.; Wang, J.; Gu, Y.; Sum, H.; Xu, L.; Kamijo, S.; Zheng, N. Human-Like Maneuver Decision Using LSTM-CRF Model for On-Road Self-Driving. In Proceedings of the 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018. [Google Scholar]
- Aradi, S. Survey of deep reinforcement learning for motion planning of autonomous vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 740–759. [Google Scholar] [CrossRef]
- Yang, F.; Li, X.Y.; Liu, Q.; Li, Z.; Gao, X. Generalized Single-Vehicle-Based Graph ReinforcementLearning for Decision-Making in Autonomous Driving. Sensor 2021, 22, 4935. [Google Scholar] [CrossRef] [PubMed]
- Amini, A.; Gilitschenski, I.; Phillips, J.; Moseyko, J.; Banerjee, R.; Karaman, S.; Rus, D. Learning robust control policies for end-to-end autonomous driving from data-driven fimulation. IEEE Robot. Autom. Lett. 2020, 5, 1143–1150. [Google Scholar] [CrossRef]
- Ibrokhimov, B.; Kim, Y.; Kang, S. Biased Pressure: Cyclic Reinforcement Learning Model for Intelligent Traffic Signal Control. Sensor 2022, 22, 2818. [Google Scholar] [CrossRef]
- Lian, R.; Tan, H.; Peng, J.; Li, Q.; Wu, Y. Cross-Type Transfer for Deep Reinforcement Learning Based Hybrid Electric Vehicle Energy Management. IEEE Trans. Veh. Technol. 2020, 69, 8367–8380. [Google Scholar] [CrossRef]
- Chu, H.; Guo, L.; Chen, H.; Gao, B. Optimal car-following control for intelligent vehicles using online road-slope approximation method. Sci. China Inf. Sci. 2021, 64, 112201. [Google Scholar] [CrossRef]
- Schmied, R.; Waschl, H.; Re, L.D. Comfort oriented robust adaptive cruise control in multi-lane traffic conditions. IFAC-PapersOnLine 2016, 49, 196–201. [Google Scholar] [CrossRef]
- Latrech, C.; Chaibet, A.; Boukhnifer, M.; Glaser, S. Integrated Longitudinal and Lateral NetworkedControl System Design for Vehicle Platooning. Sensor 2018, 18, 3085. [Google Scholar] [CrossRef] [Green Version]
- Wang, C.; Gong, S.; Zhou, A.; Li, T.; Peeta, S. Cooperative Adaptive Cruise Control for Connected Autonomous Vehicles by Factoring Communication-Related Constraints. Trans. Res. Proc. 2019, 38, 2019. [Google Scholar] [CrossRef]
- Xia, W.; Li, H.; Li, B. A Control Strategy of Autonomous Vehicles Based on Deep Reinforcement Learning. In Proceedings of the 9th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 10–11 December 2016. [Google Scholar]
- Nageshrao, S.; Tseng, H.E.; Filev, D. Autonomous Highway Driving using Deep Reinforcement Learning. In Proceedings of the 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), Bari, Italy, 6–9 October 2019. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2016, arXiv:1509.02971v6. [Google Scholar]
- Sallab, A.E.; Abdou, M.; Perot, E.; Yogamani, S. Deep reinforcement learning framework for autonomous driving. arXiv 2017, arXiv:1704.02532v1. [Google Scholar] [CrossRef] [Green Version]
- Xiong, X.; Wang, J.; Zhang, F.; Li, K. Combining deep reinforcement learning and safety based control for autonomous driving. arXiv 2016, arXiv:1612.00147v1. [Google Scholar]
- Sun, M.; Zhao, W.; Song, G.; Nie, Z.; Han, X.; Liu, Y. DDPG-based decision-making strategy of adaptive cruising for heavy vehicles considering stability. IEEE Access 2020, 8, 59225–59246. [Google Scholar] [CrossRef]
- Zhu, M.; Wang, Y.; Pu, Z.; Hu, J.; Wang, X.; Ke, R. Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving. Transp. Res. C Emerg. Technol. 2020, 117, 102622. [Google Scholar] [CrossRef]
- Pan, F.; Bao, H. Reinforcement Learning Model with a Reward Function Based on Human Driving Characteristics. In Proceedings of the 15th International Conference on Computational Intelligence and Security (CIS), Macao, China, 13–16 December 2019. [Google Scholar]
- Yan, R.; Jiang, R.; Jia, B.; Huang, J.; Yang, D. Hybrid car-following strategy based on deep deterministic policy gradient and cooperative adaptive cruise control. IEEE Trans. Autom. Sci. Eng. 2021, 14, 2816–2824. [Google Scholar] [CrossRef]
- Punzo, V.; Ciuffo, B.; Montanino, M. Can results of car-following model calibration based on trajectory data be trusted? Transp. Res. Rec. J. Transp. Res. Board 2012, 2315, 11–24. [Google Scholar] [CrossRef]
- Montanino, M.; Punzo, V. Trajectory data reconstruction and simulation-based validation against macroscopic traffic patterns. Transp. Res. Part B Methodol. 2015, 80, 82–106. [Google Scholar] [CrossRef]
- Chen, H.; Zhao, F.; Huang, K.; Tian, Y. Driver Behavior Analysis for Advanced Driver Assistance System. In Proceedings of the IEEE 7th Data Driven Control and Learning Systems Conference (DDCLS), Enshi, China, 25–27 May 2018. [Google Scholar]
- Chen, Y.S.; Chiu, S.H.; Hsiau, S.S. Safe technology with a novel rear collision avoidance system of vehicles. Int. J. Automot. Technol. 2019, 20, 693–699. [Google Scholar] [CrossRef]
- Wang, W.; Liu, C.; Zhao, D. How Much Data Are Enough? A statistical approach with case study on longitudinal driving behavior. IEEE Trans. Intell. Veh. 2017, 2, 85–98. [Google Scholar] [CrossRef] [Green Version]
- Bellem, H.; Thiel, B.; Schrauf, M.; Krems, J.F. Comfort in automated driving: An analysis of preferences for different automated driving styles and their dependence on personality traits. Transp. Res. F Traffic Psychol. Behav. 2018, 55, 90–100. [Google Scholar] [CrossRef]
Correlation | |ρ| > 0.4 | |ρ| > 0.7 |
---|---|---|
vLV-vFV | 80% | 39% |
vrel-vFV | 49% | 11% |
drel-vFV | 67% | 32% |
THW-vFV | 55% | 22% |
TTCi-vFV | 50% | 11% |
Parameter | Value |
---|---|
Learning of actor network | 0.0001 |
Learning of critic network | 0.00001 |
Discounting factor of reward | 0.9 |
Soft assign rate | 0.001 |
Capacity of replay buffer | 20,000 |
Size of minibatch | 256 |
Decay rate | 0.9995 |
Initial variance in the exploration space | 3 |
Weight parameters: α, β, δ, ε | 1 × 10−5, 0.028, 10, 5 |
Strategy | Collision | Early Stop | Completion of CF Event During Collision |
---|---|---|---|
Our proposal | 395 | 141 | 47 |
FEC | 315 | 174 | 58 |
FEC w/CP | 178 | 57 | 14 |
VC w/CP | 205 | 46 | 29 |
Strategy | THW ≤ 1.2 | THW ≤ 1.5 | THW ≤ 2 |
---|---|---|---|
Human | 27.1% | 47.1% | 72.6% |
MPC-based | 19.8% | 54.7% | 87.7% |
FEC | 22.6% | 96.4% | 98.4% |
FEC w/CP | 0.7% | 1.1% | 98% |
VC w/CP | 4.4% | 10.6% | 20.7% |
Our proposal | 43% | 96.4% | 98.4% |
Strategy | |Jerk| ≤ 1.5 | |Jerk| ≤ 2 | |Jerk| ≤ 5 |
---|---|---|---|
Human | 56.5% | 65.8% | 94.5% |
MPC-based | 85% | 90% | 99.1% |
FEC | 59.1% | 59.4% | 60.1% |
FEC w/CP | 97.9% | 98.3% | 99.1% |
VC w/CP | 98.6% | 99% | 99.6% |
Our proposal | 92% | 96.2% | 99.2% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, W.; Zhang, Y.; Shi, X.; Qiu, F. A Decision-Making Strategy for Car Following Based on Naturalist Driving Data via Deep Reinforcement Learning. Sensors 2022, 22, 8055. https://doi.org/10.3390/s22208055
Li W, Zhang Y, Shi X, Qiu F. A Decision-Making Strategy for Car Following Based on Naturalist Driving Data via Deep Reinforcement Learning. Sensors. 2022; 22(20):8055. https://doi.org/10.3390/s22208055
Chicago/Turabian StyleLi, Wenli, Yousong Zhang, Xiaohui Shi, and Fanke Qiu. 2022. "A Decision-Making Strategy for Car Following Based on Naturalist Driving Data via Deep Reinforcement Learning" Sensors 22, no. 20: 8055. https://doi.org/10.3390/s22208055
APA StyleLi, W., Zhang, Y., Shi, X., & Qiu, F. (2022). A Decision-Making Strategy for Car Following Based on Naturalist Driving Data via Deep Reinforcement Learning. Sensors, 22(20), 8055. https://doi.org/10.3390/s22208055