How AI Can Support Data Quality Improvements
The maturation of artificial intelligence (AI) and machine learning (ML) technologies, combined with the rapid emergence of generative AI (GenAI) tools, has generated strong interest among transportation agencies eager to leverage this transformative technology to enhance operations, strengthen analytical capabilities, and support more informed decision-making. As these tools depend on high-quality input data to train large language models (LLMs) and utilize retrieval-augmented generation (RAG) techniques to deliver accurate responses to staff queries, it is critical that organizations prioritize the assessment and remediation of low-quality data as well as data that does not comply with defined business rules. This research will examine the feasibility of using AI to automate the creation of data business rules through the analysis of manuals, processes, policies, training materials, and related documents. AI tools would be trained to profile datasets, identify inconsistencies and quality issues, and generate data quality rules compatible with existing tools. The structure of these rules—including tool-specific, importable structured markup formats—would be derived from ingesting application documentation, pre-existing validation logic, and natural language prompts enhanced by sample data. When provided with data for quality review, the AI system would automatically correct errors based on business rules, supporting documentation, and inferred relationships across datasets, without requiring the manual development and execution of individual validation rules. This research will document the feasibility of each capability tested, the success rates of different approaches, implementation requirements and complexity, what proved effective or ineffective, and the challenges encountered. It will also provide recommendations for future research. Expected benefits include a clearer understanding of the current capabilities of AI to improve data quality and reduce the effort needed to maintain it. The research will also help define appropriate and inappropriate uses of AI in the context of data quality. Additionally, it will streamline the process of performing data quality assessments, resulting in improved data quality, and offer a shared understanding of current potential, performance limitations, and future research needs.
- Record URL:
-
Supplemental Notes:
- Contract to a Performing Organization has not yet been awarded.
Language
- English
Project
- Status: Proposed
- Funding: $450,000.00
-
Contract Numbers:
Project 23-50
-
Sponsor Organizations:
National Cooperative Highway Research Program
Transportation Research Board
500 Fifth Street, NW
Washington, DC United States 20001American Association of State Highway and Transportation Officials (AASHTO)
444 North Capitol Street, NW
Washington, DC United States 20001Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC United States 20590 -
Project Managers:
Mohan, Sid
- Start Date: 20250526
- Expected Completion Date: 0
- Actual Completion Date: 0
Subject/Index Terms
- TRT Terms: Artificial intelligence; Data analysis; Data quality; Feasibility analysis; Machine learning
- Subject Areas: Data and Information Technology; Planning and Forecasting; Transportation (General);
Filing Info
- Accession Number: 01956432
- Record Type: Research project
- Source Agency: Transportation Research Board
- Contract Numbers: Project 23-50
- Files: TRB, RIP
- Created Date: May 28 2025 9:42AM