Leveraging Retrieval Augmented Generation (RAG) to Analyze Crash Reports Narratives
Crash reports serve as a vital source of information for understanding road crashes, devising strategies for prevention, and informing policies. However, the coding on these reports often lacks detailed characteristics crucial for comprehensive analysis of pedestrian and bicyclist crashes. Crash reports typically contain structured data, which may lack the nuanced details often found in the narrative section regarding the circumstances surrounding a crash. Information such as unhoused status of a pedestrian, detailed explanation of the vehicle movement before hitting a pedestrian, witness description of a speeding vehicle’s behavior pre-crash, and description of a hit-and-run crash conditions may be embedded within the narrative descriptions but remain unrecorded in the structured fields of the report form. Extracting this implicit data poses a significant challenge for traditional analysis methods. Retrieval Augmented Generation (RAG), employs an embedding model to scan extensive text, seeking similarities between the query—here, the presence of a vulnerability factor or demographic context—and segments of the text. Once relevant portions are pinpointed, both the query and context undergo analysis by a Large Language Model (LLM). In this instance, the LLM validates the presence of and extracts pertinent information. This study will explore the ability of RAG to identify crash characteristics found only in the crash report narratives using crash reports from California.
- Record URL:
Language
- English
Project
- Status: Active
- Funding: $115020
-
Contract Numbers:
69A3552348336
-
Sponsor Organizations:
Office of the Assistant Secretary for Research and Technology
University Transportation Centers Program
Department of Transportation
Washington, DC United States 20590 -
Managing Organizations:
Office of the Assistant Secretary for Research and Technology
University Transportation Centers Program
Department of Transportation
Washington, DC United States 20590 -
Project Managers:
Stearns, Amy
-
Performing Organizations:
University of California, Berkeley
Safe Transportation Research and Education Center
Berkeley, CA United States 94720 -
Principal Investigators:
Griswold, Julia
- Start Date: 20240601
- Expected Completion Date: 20250531
- Actual Completion Date: 0
- USDOT Program: University Transportation Centers
Subject/Index Terms
- TRT Terms: Crash characteristics; Crash reports; Data analysis; Information processing; Pedestrian vehicle crashes
- Geographic Terms: California
- Subject Areas: Data and Information Technology; Highways; Pedestrians and Bicyclists; Safety and Human Factors;
Filing Info
- Accession Number: 01923812
- Record Type: Research project
- Source Agency: Center for Pedestrian and Bicyclist Safety
- Contract Numbers: 69A3552348336
- Files: UTC, RIP
- Created Date: Jul 8 2024 2:54PM