Development of a Data-Driven Optimal Controller Based on Adaptive Dynamic Programming

Through vehicle-to-vehicle (V2V) communication, both human-driven and autonomous vehicles can actively exchange data, such as velocities and bumper-to-bumper distances. By employing the shared data, control laws with improved performance can be designed for connected and autonomous vehicles (CAVs). The research proposes an adaptive optimal control design method for a platoon mixed with multiple preceding human-driven vehicles and one CAV at the tail, while taking into account human-vehicle interaction and heterogeneous driver behavior. It is shown that, by using reinforcement-based learning and adaptive dynamic programming techniques, a near-optimal controller can be learned from real-time data for CAV with V2V communications, and do so without precise knowledge of car-following parameters of any driver in the platoon. The proposed method allows the CAV controller to adapt to different platoon dynamics caused by unknown and heterogeneous driver-dependent parameters. To improve safety performance during the learning process, our off-policy learning algorithm can leverage both historical data and data collected in real-time, which leads to considerably reduced learning time duration. The effectiveness and efficiency of our proposed method is demonstrated by rigorous proofs and microscopic traffic simulations.