Leveraging Retrieval Augmented Generation (RAG) to Analyze Crash Reports Narratives

Crash reports serve as a vital source of information for understanding road crashes, devising strategies for prevention, and informing policies. However, the coding on these reports often lacks detailed characteristics crucial for comprehensive analysis of pedestrian and bicyclist crashes. Crash reports typically contain structured data, which may lack the nuanced details often found in the narrative section regarding the circumstances surrounding a crash. Information such as unhoused status of a pedestrian, detailed explanation of the vehicle movement before hitting a pedestrian, witness description of a speeding vehicle’s behavior pre-crash, and description of a hit-and-run crash conditions may be embedded within the narrative descriptions but remain unrecorded in the structured fields of the report form. Extracting this implicit data poses a significant challenge for traditional analysis methods. Retrieval Augmented Generation (RAG), employs an embedding model to scan extensive text, seeking similarities between the query—here, the presence of a vulnerability factor or demographic context—and segments of the text. Once relevant portions are pinpointed, both the query and context undergo analysis by a Large Language Model (LLM). In this instance, the LLM validates the presence of and extracts pertinent information. This study will explore the ability of RAG to identify crash characteristics found only in the crash report narratives using crash reports from California.

Language

  • English

Project

  • Status: Active
  • Funding: $115020
  • Contract Numbers:

    69A3552348336

  • Sponsor Organizations:

    Office of the Assistant Secretary for Research and Technology

    University Transportation Centers Program
    Department of Transportation
    Washington, DC  United States  20590
  • Managing Organizations:

    Office of the Assistant Secretary for Research and Technology

    University Transportation Centers Program
    Department of Transportation
    Washington, DC  United States  20590
  • Project Managers:

    Stearns, Amy

  • Performing Organizations:

    University of California, Berkeley

    Safe Transportation Research and Education Center
    Berkeley, CA  United States  94720
  • Principal Investigators:

    Griswold, Julia

  • Start Date: 20240601
  • Expected Completion Date: 20250531
  • Actual Completion Date: 0
  • USDOT Program: University Transportation Centers

Subject/Index Terms

Filing Info

  • Accession Number: 01923812
  • Record Type: Research project
  • Source Agency: Center for Pedestrian and Bicyclist Safety
  • Contract Numbers: 69A3552348336
  • Files: UTC, RIP
  • Created Date: Jul 8 2024 2:54PM