1. Home
  2. /
  3. 10/8/2021: Sparse computation as...

10/8/2021: Sparse computation as a compact representation of datasets, Dorit Hochbaum, University of California, Berkeley

Speaker/presenter: Dorit Hochbaum S.

i) Presentation description:

Sparse computation is a technique developed in order to efficiently compute pairwise similarities in large scale data sets.  The motivation is that ML techniques based on pairwise similarities work very well compared to other machine learning techniques, [1], yet, the computational effort of computing all pairwise similarities grows quadratically in the size of the dataset.  Such techniques include SVMR KSNC SNC and KNN. In particular the technique tested, HNC (and its variants SNC and KSNC) scales very well for images.  The idea of sparse computation is to identify in advance, prior to the calculation of the similarities, which pairwise similarities are “relevant”, by projecting the data to a low dimensional space, and subdivide it into a grid, a representation analogous to the pixels of an image.  The pairwise distances, or similarities, are then computed only for those falling in the same block or in adjacent blocks.  The technique has been shown, in [2], to sparsify the similarity matrix dramatically, while maintaining the accuracy of the respective ML techniques.  Additional improvements, [3] and [4], in identifying the pairs that reside in adjacent blocks allow to scale up sparse computation to massive data sets.  This research bodes well to the quality of the low dimensional grid representation of a dataset.  The projection is done efficiently with “approximate PCA”, which is an adaptation of the randomized SVD algorithm of Drineas et al. in [5].

References: 

[1] Philip Baumann, Dorit S. Hochbaum and Yan T. Yang. A comparative study of the leading machine learning techniques and two new optimization algorithms. European Journal of Operational Research, Volume 272, Issue 3, 1 February 2019, Pages 1041-1057. https://hochbaum.ieor.berkeley.edu/html/pub/HNC-comparative-EJOR2019.pdf

[2] D.S. Hochbaum and P. Baumann. Sparse Computation for Large-Scale Data Mining. IEEE Transactions on Big Data, Vol 2, Issue 2, 151-174, 2016. https://hochbaum.ieor.berkeley.edu/html/pub/HB-Sparse-BigData2016.pdf

[3] P. Baumann, D. S. Hochbaum and Q. Spaen. High-Performance Geometric Algorithms for Sparse Computation in Big Data Analytics. 2017 IEEE International Conference on Big Data. Boston Ma, pp. 546-555. IEEE, 2017. https://hochbaum.ieor.berkeley.edu/html/pub/Sparse-geometric-BD17.pdf

[4] P. Baumann, D. S. Hochbaum and Q. Spaen. Sparse-Reduced Computation: Enabling Mining of Massively-Large Data Sets. In M. De Marsico,G. Sanniti di Baja, and A. Fred, eds., Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods, 224-231, Rome, 2016. https://hochbaum.ieor.berkeley.edu/html/pub/ICPRAM-BHS-2016.pdf

[5] Drineas, P., Kannan, R., and Mahoney, M.W. (2006). Fast monte carlo algorithms for matrices II: computing a low-rank approximation to a matrix. SIAM J. Computing, 36:158–183.

[6] Brisaboa, N. R., Luaces, M. R., Pedreira, O., Places, Á. S., & Seco, D. (2009, June). Indexing dense nested metric spaces for efficient similarity search. In International Andrei Ershov Memorial Conference on Perspectives of System Informatics (pp. 98-109). Springer, Berlin, Heidelberg.

[7] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).