Consider a matrix
For a sketching matrix
where
From Equation \ref{eq:nys} you can observe that the three terms can be defined by just doing the multiplication
So, it is important to note that the randomised Nyström required just one pass over the original data
Two aspects have to be taken into consideration:
- how do you compute the pseudo-inverse of
$B = \Omega_1^T A \Omega_1$ ? - where should I do the rank-k approximation? On the middle term
$B$ or directly on$\tilde{A}_{Nyst}$ ?
Concerning the first question the easiest way to proceed is by applying the Cholesky factorization to
Notice that the SVD factorization has to satisfy the property from Cholesky factorization such that the starting matrix can be expressed as the product of a lower triangular matrix and its conjugate transpose. To have this nice property, since
Concerning the second question, if you do a rank-k truncation directly on
Randomised Nyström with rank-k truncation on
Input
- Compute
$C = A \Omega_1$ ,$C \in \mathcal{R}^{mxl}$ . - Compute
$B = \Omega_1^T C$ ,$B \in \mathcal{R}^{lxl}$ . - Compute truncated rank-k eigenvalue decomposition of
$B = U \Lambda U^T$ . - Compute the pseudo-inverse as:
$B_k^+ = U(:, 1:k) \Lambda(:, 1:k)^{+} U(:,1:k)^T$ . - Compute the QR decomposition of
$C = QR$ . - Compute the eigenvalue decomposition of
$R B_k^+ R^T = U_k \Lambda_k U_k^T$ . - Compute
$\hat{U_k} = Q U_k$ . - Output
$\hat{U_k}$ and$\Lambda_k$ .
Then, the preferred way is to do the rank-k approximation to
Randomised Nyström with rank-k truncation on
Input
- Compute
$C = A \Omega_1$ ,$C \in \mathcal{R}^{mxl}$ . - Compute
$B = \Omega_1^T C$ ,$B \in \mathcal{R}^{lxl}$ . - Apply Cholesky factorization to
$B = LL^T$ . - Compute
$Z = CL^{-T}$ with back substitution (which means solve the system$L^T Z = C$ ). - Compute the QR factorization of
$Z = QR$ . - Compute truncated rank-k SVD of
$R$ as$U_k \Sigma_k V_k^T$ . - Compute
$\hat{U_k} = Q U_k$ . - Output
$\hat{U_k}$ and$\Sigma_k^2$ .
A small note, you can decide to compute
From an algebraic point of view, the following steps have to be made:
(A Ω₁)(Ω₁ᵀ A Ω₁)⁺ (Ω₁ᵀ A)
= C L⁻ᵀ L⁻¹ Cᵀ
= Z Zᵀ [recall Z = CL⁻ᵀ]
= Q R Rᵀ Qᵀ [QR-factorization of Z]
= Q Uₖ Σₖ Σₖ Uₖᵀ Qᵀ [rank-k SVD of R]
= Ûₖ Σₖ² Ûₖᵀ
and that:
in this case since
The project will be developed by using Algorithm 2 (Randomised Nyström with rank-k truncation on
Two sketching matrices will be used to test the algorithm. The first one is the Gaussian embeddings, the idea is to generate a matrix
The second sketching matrix is the block SRHT embeddings. It has been derived from the sub-sampled randomised Hadamard transform matrix:
The three matrices are the following:
-
$D \in \mathbb{R}^{m \times m}$ : diagonal matrix of independent random sign (plus or minus ones in the diagonal). -
$H \in \mathbb{R}^{m \times m}$ : normalized Walsh-Hadamard matrix. -
$P \in \mathbb{R}^{l \times m}$ : draws$l$ rows uniformly at random.
In this case,
Regrettably, the suitability of products featuring SRHT matrices for distributed computing is limited, thereby restricting the advantages of SRHT on contemporary architectures. This limitation primarily arises from the challenge of computing products with
This justifies the introduction of the new method block-SRHT, which attempts to distribute the workload between the various processors as
Each
The global
Notice that the $ H $ matrix maintains orthogonality because overall it appears as a block diagonal matrix. Also, the advantage of the sketching matrix written in this way is that it is easy to parallelize the matrix-matrix multiplication:
Two datasets were used to test the Nyström algorithm. The first one is the
The original black and white (bi-level) images were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The normalized image is located in a 28×28 plane.
The MNIST database contains 60,000 training images and 10,000 testing images. The MNIST dataset has images with pixel values in the range [0, 255]. It is considered the scaled one, so each feature is divided by 255 and now the range of values is between [0, 1].\
The second dataset
The dataset contains songs that are mostly Western, commercial tracks ranging from 1922 to 2011, with a peak in the year 2000s. The dimension of the training set is 463,715 and the dimension of the testing set is 51,630. Each element contains 90 features: 12, the timbre average, and 78, the timbre covariance.
The previous dataset was chosen to make a comparison with the results coming from the article [1].
The matrix used for testing was built through the following radial basis function:
The python class create_matrix.py contains other functions to create other matrices for the analysis of the Nyström algorithm. A description of those matrices can be found in [2].
[1]
Balabanov, Oleg and Beaupère, Matthias and Grigori, Laura and Lederer, Victor (2022).
'Block subsampled randomised Hadamard transform for low-rank approximation on distributed architectures'.
[2]
Tropp, J. A. and Yurtsever, A. and Udell, M. and Cevher, V. (2017).
'Fixed-rank approximation of a positive-semidefinite matrix from streaming data'.