How AI Can Support Data Quality Improvements

The maturation of artificial intelligence (AI) and machine learning (ML) technologies, combined with the rapid emergence of generative AI (GenAI) tools, has generated strong interest among transportation agencies eager to leverage this transformative technology to enhance operations, strengthen analytical capabilities, and support more informed decision-making. As these tools depend on high-quality input data to train large language models (LLMs) and utilize retrieval-augmented generation (RAG) techniques to deliver accurate responses to staff queries, it is critical that organizations prioritize the assessment and remediation of low-quality data as well as data that does not comply with defined business rules. This research will examine the feasibility of using AI to automate the creation of data business rules through the analysis of manuals, processes, policies, training materials, and related documents. AI tools would be trained to profile datasets, identify inconsistencies and quality issues, and generate data quality rules compatible with existing tools. The structure of these rules—including tool-specific, importable structured markup formats—would be derived from ingesting application documentation, pre-existing validation logic, and natural language prompts enhanced by sample data. When provided with data for quality review, the AI system would automatically correct errors based on business rules, supporting documentation, and inferred relationships across datasets, without requiring the manual development and execution of individual validation rules. This research will document the feasibility of each capability tested, the success rates of different approaches, implementation requirements and complexity, what proved effective or ineffective, and the challenges encountered. It will also provide recommendations for future research. Expected benefits include a clearer understanding of the current capabilities of AI to improve data quality and reduce the effort needed to maintain it. The research will also help define appropriate and inappropriate uses of AI in the context of data quality. Additionally, it will streamline the process of performing data quality assessments, resulting in improved data quality, and offer a shared understanding of current potential, performance limitations, and future research needs.

Language

  • English

Project

  • Status: Proposed
  • Funding: $450,000.00
  • Contract Numbers:

    Project 23-50

  • Sponsor Organizations:

    National Cooperative Highway Research Program

    Transportation Research Board
    500 Fifth Street, NW
    Washington, DC  United States  20001

    American Association of State Highway and Transportation Officials (AASHTO)

    444 North Capitol Street, NW
    Washington, DC  United States  20001

    Federal Highway Administration

    1200 New Jersey Avenue, SE
    Washington, DC  United States  20590
  • Project Managers:

    Mohan, Sid

  • Start Date: 20250526
  • Expected Completion Date: 0
  • Actual Completion Date: 0

Subject/Index Terms

Filing Info

  • Accession Number: 01956432
  • Record Type: Research project
  • Source Agency: Transportation Research Board
  • Contract Numbers: Project 23-50
  • Files: TRB, RIP
  • Created Date: May 28 2025 9:42AM