Streaming PCA with Many Missing Entries
The project team proposes to consider the streaming memory-constrained principal component analysis (PCA) problem with missing entries, where the available storage is linear in the dimensionality of the problem, and each vector has so many missing entries that matrix completion is not possible. For this problem, the team proposes a method based on a block power update approach introduced in their previous work. The project team shows on synthetic as well as benchmark data sets that their approach outperforms existing approaches for streaming PCA by a significant margin for several interesting problem settings. The team also considers the popular spiked covariance model with randomly missing entries, and obtain the first known global convergence guarantees for this problem. The project team shows that their method converges to the true "spike'' using a number of samples that is linear in the dimension of the data. Moreover, the team's memory requirement is also linear in the ambient dimension. Thus, both memory and sample complexity have optimal scaling with dimension. Streaming PCA is extremely relevant in the setting where the resolution of the project team's sensors outpaces their ability to store massive amounts of data. This is precisely the setting the project team faces as they increase the amount of intelligence and high precision/resolution sensors on the fleet of private and commercial vehicles.
- Record URL:
Language
- English
Project
- Status: Completed
- Funding: $33000
-
Sponsor Organizations:
Office of the Assistant Secretary for Research and Technology
University Transportation Centers Program
Department of Transportation
Washington, DC United States 20590 -
Project Managers:
Bhat, Chandra
-
Performing Organizations:
Data-Supported Transportation Operations and Planning Center
University of Texas at Austin
Austin, TX United States 78701 -
Principal Investigators:
Caramanis, Constantine
- Start Date: 20130930
- Expected Completion Date: 20151231
- Actual Completion Date: 20151231
- Source Data: 110
Subject/Index Terms
- TRT Terms: Covariance; Intelligence; Multivariate analysis; Sensors; Vehicle fleets
- Subject Areas: Data and Information Technology; Passenger Transportation; Planning and Forecasting;
Filing Info
- Accession Number: 01580146
- Record Type: Research project
- Source Agency: Data-Supported Transportation Operations and Planning Center
- Files: UTC, RIP
- Created Date: Oct 30 2015 10:21AM