Analyzing Social Media for Improving Transportation Safety

The goal of this project is to develop an online intelligent system that automatically monitors and collects timely and comprehensive information from social media (e.g., blogs, online forums, and twitter) about the current status of the transportation network and traffic flow to support advanced safety enhancement. The proposed approach is composed of five major components: (1) Public Safety Data Extraction. The project plan is to build a classifier (e.g., SVM) to automatically identify transportation-safety related posts on local social media platforms covering the area of interest. However, it is computationally expensive to train a classifier for social media, because of the short length and large volume of the messages, as well as the non-standard abbreviations. It is much cheaper to collect labels for news articles (e.g., national transportation safety board), so transfer learning techniques can be applied to build the classifier without the direct labeling of social media.(2) Heterogeneous Safety Data Modeling. Social media is heterogeneous by nature and has a variety of both entity types (e.g. user, post, hashtag, term, link, mention, location, and time) and relationships (e.g. originator, reply, friendship, and followership). To model this very complex data structure, we plan to build a heterogeneous network model for the safety data. (3) Transportation Safety Topics: Discovery. Transportation safety could include many different topics, such as road blockage or damage due to heavy snows or floods, missing people swept away by a flood, the malfunctioning of traffic lights, traffic incidents, and drunk driving to name but a few. In addition, topics may relate to different geographic locations and time periods. The project proposers to design a customized spatiotemporal topic model specifically for transportation safety applications. (4) Bias Estimation Using Traditional Traffic Sensor Data. Social media could potentially be a biased sample, and it is important to estimate this bias by cross-validation using traditional transportation census data, such as loop detector and camera data, incident reports, and transportation surveys. (5) User Interface and High Level Applications. These will include a regional sentiment index, safety alarms, and safety recommendations.


  • English


  • Status: Active
  • Funding: $149354.00
  • Contract Numbers:


  • Sponsor Organizations:

    Research and Innovative Technology Administration

    University Transportation Centers Program
    1200 New Jersey Avenue
    Washington, DC  United States  20590

    Technologies for Safe and Efficient Transportation University Transportation Center

    Carnegie Mellon University
    Pittsburgh, PA  United States  15213
  • Performing Organizations:

    Carnegie Mellon University

    Pittsburgh, PA  United States 
  • Start Date: 20130101
  • Expected Completion Date: 0
  • Actual Completion Date: 20131231
  • Source Data: RiP Project 35548

Subject/Index Terms

Filing Info

  • Accession Number: 01498009
  • Record Type: Research project
  • Source Agency: Technologies for Safe and Efficient Transportation University Transportation Center
  • Contract Numbers: DTRT12GUTG11
  • Files: UTC, RiP
  • Created Date: Nov 16 2013 1:00AM