- Let \(Y\in \mathbb R^{n\times p}\) be rank-\(p\) with SVD \(Y= U D V^\top\), so that \(U^\top U = V^\top V = I_p\).
- show that \(v_k\) is an eigenvector of \(Y^\top Y\), and give the corresponding eigenvalue;
- show that \(u_k\) is an eigenvector of \(YY^\top\), and give the corresponding eigenvalue;
- Show that \(Y v_k = d_k u_k\) and \(Y^\top u_k = d_k v_k\).
- Let \(M_k = d_k u_k v_k^\top\), so that \(Y=\sum_k M_k\). Show that tr\((M_k^\top M_{k'}) = d_k^2\) if \(k=k'\) and is zero otherwise.
- Kernel exercises: Let \(k_1\) and \(k_2\) be two positive definite kernel functions such that \(k_j(y,y') = \phi_j(y)^\top \phi_j(y')\), \(\phi_j:\mathcal Y \rightarrow \mathbb R^{m_j}\), \(j\in \{1,2\}\).
- Show that sums of kernels are kernels, that is, \(a_1k_1 + a_2 k_2\) is a positive definite kernel function if \(a_1,a_2\) are nonnegative.
- Show that products of kernels are kernels, that is, \(k_{12} = k_1 \times k_2\) is a positive definite kernel function.
- The dataset
pendigits.rds
is an 8 x 2 x 1000 matrix, corresponding to 1000 data points, each data point being an 8 x 2 matrix. The matrix represents, in time order, pen locations on a writing tablet while someone was writing a numerical digit. For example, the following code loads the data and plots one observation of a written 8
:
Y<-readRDS("pendigits.rds")
plot(Y[,,12],type="l")
dimnames(Y)[[3]][12]
- Compute the first eight linear principal components and plot them, for example plotting the 1st versus 2nd in one figure, 3rd versus 4th in a second figure, etc. Use colors and plotting characters to identify which digits correspond to which points on your plots. Also plot the cumulative fraction of variance explained as a function of the number of components.
- Now obtain the first eight kernel principal components using the kernel function \(k(x,y)= exp(-\gamma || x-y||^2)\) and make the same plots as in a. Do this for each value of \(\gamma\in \{ 10^{-5},10^{-4}, 10^{-3} \}\). Compare to each other and to the linear principal components. Discuss the use of kernel PCA for dimension reduction for these data.
- Can you think of some other kernel function that might be more appropriate for these data?
- Consider again the kernel \(k(x,y)= exp(-\gamma || x-y||^2)\) from the previous problem. Compare mathematically the linear PCs to the kernel PCs obtained from \(CKC\), in the limit as \(\gamma \rightarrow 0\).