Integrating Occupancy Grids with Spatial-Temporal Reinforcement Learning for Enhanced Control

This study addresses the challenge of effectively interpreting and navigating complex dynamic driving environments, using occupancy grids as a primary mode of spatial input representation. In this work, the research team presents a novel approach that combines the strengths of reinforcement learning (RL) and transformer-based architectures, particularly focusing on leveraging the transformer encoder across spatial and temporal dimensions. In the realm of autonomous systems and robotics, machine learning-based controller design can be broadly categorized into two types: supervised-learning methods and reinforcement learning methods. Supervised learning methods, such as imitation learning, require the collection of large amounts of data and corresponding expert behaviors. In contrast, reinforcement learning generates data and learns control strategies through repeated interaction with the environment. For processing image-based inputs, traditional RL approaches in robotics rely heavily on convolutional neural networks (CNNs) for spatial understanding. The advent of transformer models in natural language processing (NLP) and their subsequent adaptations to other domains suggest significant potential applications for these models in handling complex spatial data in autonomous systems. From a temporal perspective, general reinforcement learning, particularly when based on Markov models, operates under the assumption that the future state is dependent solely on the current state and action. This approach, while considering sequences of states and actions, does not explicitly model the entire history or treat the control problem as an extensive time series. It focuses on the immediate transition without accounting for the full sequence of past behaviors. To address the inherent limitations in traditional reinforcement learning approaches, this work will focus on the integration of advanced machine learning techniques in the field of autonomous systems. The team will develop a transformer-based model that effectively utilizes occupancy grids, a standard tool in robotics for environment mapping and navigation, combined with a spatial attention mechanism. This combination provides a structured way to represent spatial information, enhancing the model's capabilities in understanding and navigating its environment. The team will also explore temporal embedding techniques so that the model can understand and interpret dynamic scenarios and effectively track and respond to dynamic sequences. The research aims to make a significant advancement in autonomous navigation by innovatively applying both spatial and temporal transformers within a reinforcement learning framework. The first key innovation is the application of a spatial transformer in the model. This is a crucial development as it allows for more effective extraction of spatial information. This enhanced spatial understanding is vital for navigating complex and dynamic environments, allowing the model to make more accurate and reliable decisions based on a nuanced understanding of its surroundings. The second major innovation involves the incorporation of a temporal transformer. This element is vital for the model to make informed decisions based on a series of actions and states over time, predicting future scenarios and making strategic decisions that consider the trajectory of environmental changes. The team will conduct a comprehensive series of experiments to rigorously evaluate the performance of their transformer-based model in a variety of simulated environments. The purpose of these simulations will be to benchmark the model against traditional machine learning models commonly used in reinforcement learning. A primary objective of these experiments is to showcase the enhanced navigation efficiency and decision-making accuracy of the model, especially in comparison to conventional RL models. These experiments are aimed at highlighting not just quantitative improvements in navigation and decision-making, but also qualitative advancements in traffic safety management.

Language

  • English

Project

  • Status: Active
  • Funding: $158814
  • Contract Numbers:

    69A3552344811

  • Sponsor Organizations:

    Office of the Assistant Secretary for Research and Technology

    University Transportation Centers Program
    Department of Transportation
    Washington, DC  United States  20590
  • Managing Organizations:

    Carnegie Mellon University

    Pittsburgh, PA  United States 

    Safety21 University Transportation Center

    Carnegie Mellon University
    Pittsburgh, PA  United States  15213
  • Project Managers:

    Stearns, Amy

  • Performing Organizations:

    Ohio State University

    Columbus, OH  United States 
  • Principal Investigators:

    Redmill, Keith

  • Start Date: 20240701
  • Expected Completion Date: 20250630
  • Actual Completion Date: 0
  • USDOT Program: University Transportation Centers

Subject/Index Terms

Filing Info

  • Accession Number: 01933398
  • Record Type: Research project
  • Source Agency: Safety21 University Transportation Center
  • Contract Numbers: 69A3552344811
  • Files: UTC, RIP
  • Created Date: Oct 13 2024 8:28AM