Spectral analysis of sparse data: Translating physics into algorithms
Lenka Zdeborova  1  
1 : Institut de Physique Théorique (ex SPhT)  (IPHT)  -  Website
CEA, CNRS : URA2306
Institut de Physique Théorique Orme des Merisiers batiment 774 Point courrier 136 CEA/DSM/IPhT, CEA/Saclay F-91191 Gif-sur-Yvette Cedex -  France

A number of problems in data science can be solved using algorithms based on the spectrum of a matrix associated with the data. Principal component analysis is a well-known example of such a spectral method. There are many examples of applications that require the matrix associated with the data to be sparse. However, the traditionally considered sparse matrices associated to the data develop spurious large eigenvalues associated with localised eigenvectors that harm the algorithmic performance. These so-called Lifshitz tails are analogous to impurity states in disordered systems. Inspired by the theory of spin glasses, we introduce the non-backtracking operator that is able to mitigate this problem and has no spurious eigenvalues. We discuss properties of this operator, as well as its applications to several algorithmic problems such as clustering of networks, percolation, matrix completion or inference from pairwise comparisons.


Online user: 1