Tensor Decompositions for Large-Scale Data Mining: Methods for Uncovering Latent Patterns in Multidimensional Big Data
Keywords:
Tensor decompositions, Large-scale data mining, Multidimensional data, Latent pattern discovery, Tucker decompositionAbstract
With the proliferation of big data across many domains, there is an increasing need for advanced analytical methods that can uncover latent patterns and extract useful knowledge from massive, multidimensional datasets. Tensor decompositions offer a powerful approach for large-scale data mining by representing higher-order data arrays as a multilinear model via decomposition into factor matrices. This allows for dimensionality reduction while preserving the essential structure and relationships within the data. In this paper, we provide a comprehensive overview of tensor decompositions for data mining, including the mathematical foundations, algorithms, applications, and software implementations. We focus on the two most widely used techniques: CANDECOMP/PARAFAC (CP) and Tucker decompositions. Through detailed numerical examples on real-world datasets, we demonstrate how tensor decompositions can be utilized for latent pattern discovery in areas such as social network analysis, neuroimaging analysis, recommender systems, and text mining. We also discuss computational aspects and scalability challenges associated with applying tensor methods to massive datasets. Overall, tensor decompositions provide versatile tools for uncovering hidden signals in big data, with tremendous potential for gaining actionable insights across many domains.