Streaming PCA with Many Missing Entries

The project team proposes to consider the streaming memory-constrained principal component analysis (PCA) problem with missing entries, where the available storage is linear in the dimensionality of the problem, and each vector has so many missing entries that matrix completion is not possible. For this problem, the team proposes a method based on a block power update approach introduced in their previous work. The project team shows on synthetic as well as benchmark data sets that their approach outperforms existing approaches for streaming PCA by a significant margin for several interesting problem settings. The team also considers the popular spiked covariance model with randomly missing entries, and obtain the first known global convergence guarantees for this problem. The project team shows that their method converges to the true "spike'' using a number of samples that is linear in the dimension of the data. Moreover, the team's memory requirement is also linear in the ambient dimension. Thus, both memory and sample complexity have optimal scaling with dimension. Streaming PCA is extremely relevant in the setting where the resolution of the project team's sensors outpaces their ability to store massive amounts of data. This is precisely the setting the project team faces as they increase the amount of intelligence and high precision/resolution sensors on the fleet of private and commercial vehicles.

Language

  • English

Project

  • Status: Completed
  • Funding: $33000
  • Sponsor Organizations:

    Office of the Assistant Secretary for Research and Technology

    University Transportation Centers Program
    Department of Transportation
    Washington, DC  United States  20590
  • Project Managers:

    Bhat, Chandra

  • Performing Organizations:

    Data-Supported Transportation Operations and Planning Center

    University of Texas at Austin
    Austin, TX  United States  78701
  • Principal Investigators:

    Caramanis, Constantine

  • Start Date: 20130930
  • Expected Completion Date: 20151231
  • Actual Completion Date: 20151231
  • Source Data: 110

Subject/Index Terms

Filing Info

  • Accession Number: 01580146
  • Record Type: Research project
  • Source Agency: Data-Supported Transportation Operations and Planning Center
  • Files: UTC, RiP
  • Created Date: Oct 29 2015 11:56AM