Mining Police Crash Report Narratives: A Natural Language Processing Approach to Identify Bus-Stop Related Crashes
Transit riders are a particularly vulnerable population, as they often walk to and from bus stops, wait in areas where multiple transportation modes interact, and cross the road at least once during a round trip. Prior studies have identified a significant relationship between transit elements (i.e., stops, corridors, and ridership levels) and pedestrian crash locations. National databases like the Fatality Analysis Reporting System (FARS) reported 196 transit bus stop-related pedestrian crashes (2014–2022), while the Crash Report Sampling System (CRSS) reported 93 (2016–2022). This small sample appears to contradict rising pedestrian crash trends in the U.S., suggesting potential underreporting due to inconsistent definitions, lack of standardized fields for transit bus stop-related crashes, or variation in how crashes are coded. To address this gap, artificial intelligence methods like natural language processing (NLP), specifically named entity recognition (NER), can extract transit bus stop-related details from police crash report narratives. NER will be applied to Minnesota and Tennessee datasets to identify such crashes. The model will be trained, validated, and tested for generalizability using metrics like precision and recall. Results will be cross-analyzed with national databases (FARS, CRSS) to test the hypothesis that transit bus stop-related crashes are underreported. Misclassified cases will also be analyzed to identify patterns. While NER has been widely used to improve crash data quality, it has not been applied to identify transit bus stop-related crashes specifically. This approach could streamline data collection, reduce manual review time, and enhance the accuracy of pedestrian crash data. By addressing a critical gap in crash reporting, this work will improve the ability to study risks faced by transit riders and inform safety improvements at bus stops.
- Record URL:
Language
- English
Project
- Status: Active
- Funding: $84,983.00
-
Contract Numbers:
69A3552348336
-
Sponsor Organizations:
Office of the Assistant Secretary for Research and Technology
University Transportation Centers Program
Department of Transportation
Washington, DC United States 20590 -
Managing Organizations:
Office of the Assistant Secretary for Research and Technology
Department of Transportation
1200 New Jersey Avenue, SE
Washington, DC United States 20590 -
Project Managers:
Stearns, Amy
-
Performing Organizations:
University of Tennessee, Knoxville
Center for Transportation Research (CTR)
Knoxville, TN United States 37996 -
Principal Investigators:
Brakewood, Candace
- Start Date: 20251201
- Expected Completion Date: 20261130
- Actual Completion Date: 0
- USDOT Program: University Transportation Centers
Subject/Index Terms
- TRT Terms: Artificial intelligence; Bus stops; Crash data; Crash reports; Data analysis; Pedestrian vehicle crashes; Transit riders
- Subject Areas: Data and Information Technology; Highways; Pedestrians and Bicyclists; Safety and Human Factors;
Filing Info
- Accession Number: 01971436
- Record Type: Research project
- Source Agency: Center for Pedestrian and Bicyclist Safety
- Contract Numbers: 69A3552348336
- Files: UTC, RIP
- Created Date: Nov 17 2025 3:03PM