Mining Police Crash Report Narratives: A Natural Language Processing Approach to Identify Bus-Stop Related Crashes

Transit riders are a particularly vulnerable population, as they often walk to and from bus stops, wait in areas where multiple transportation modes interact, and cross the road at least once during a round trip. Prior studies have identified a significant relationship between transit elements (i.e., stops, corridors, and ridership levels) and pedestrian crash locations. National databases like the Fatality Analysis Reporting System (FARS) reported 196 transit bus stop-related pedestrian crashes (2014–2022), while the Crash Report Sampling System (CRSS) reported 93 (2016–2022). This small sample appears to contradict rising pedestrian crash trends in the U.S., suggesting potential underreporting due to inconsistent definitions, lack of standardized fields for transit bus stop-related crashes, or variation in how crashes are coded. To address this gap, artificial intelligence methods like natural language processing (NLP), specifically named entity recognition (NER), can extract transit bus stop-related details from police crash report narratives. NER will be applied to Minnesota and Tennessee datasets to identify such crashes. The model will be trained, validated, and tested for generalizability using metrics like precision and recall. Results will be cross-analyzed with national databases (FARS, CRSS) to test the hypothesis that transit bus stop-related crashes are underreported. Misclassified cases will also be analyzed to identify patterns. While NER has been widely used to improve crash data quality, it has not been applied to identify transit bus stop-related crashes specifically. This approach could streamline data collection, reduce manual review time, and enhance the accuracy of pedestrian crash data. By addressing a critical gap in crash reporting, this work will improve the ability to study risks faced by transit riders and inform safety improvements at bus stops.

Language

  • English

Project

  • Status: Active
  • Funding: $84,983.00
  • Contract Numbers:

    69A3552348336

  • Sponsor Organizations:

    Office of the Assistant Secretary for Research and Technology

    University Transportation Centers Program
    Department of Transportation
    Washington, DC  United States  20590
  • Managing Organizations:

    Office of the Assistant Secretary for Research and Technology

    Department of Transportation
    1200 New Jersey Avenue, SE
    Washington, DC  United States  20590
  • Project Managers:

    Stearns, Amy

  • Performing Organizations:

    University of Tennessee, Knoxville

    Center for Transportation Research (CTR)
    Knoxville, TN  United States  37996
  • Principal Investigators:

    Brakewood, Candace

  • Start Date: 20251201
  • Expected Completion Date: 20261130
  • Actual Completion Date: 0
  • USDOT Program: University Transportation Centers

Subject/Index Terms

Filing Info

  • Accession Number: 01971436
  • Record Type: Research project
  • Source Agency: Center for Pedestrian and Bicyclist Safety
  • Contract Numbers: 69A3552348336
  • Files: UTC, RIP
  • Created Date: Nov 17 2025 3:03PM