832 Homework 3

Let \(Y\in \mathbb R^{n\times p}\) be rank-\(p\) with SVD \(Y= U D V^\top\), so that \(U^\top U = V^\top V = I_p\).
1. show that \(v_k\) is an eigenvector of \(Y^\top Y\), and give the corresponding eigenvalue;
2. show that \(u_k\) is an eigenvector of \(YY^\top\), and give the corresponding eigenvalue;
3. Show that \(Y v_k = d_k u_k\) and \(Y^\top u_k = d_k v_k\).
4. Let \(M_k = d_k u_k v_k^\top\), so that \(Y=\sum_k M_k\). Show that tr\((M_k^\top M_{k'}) = d_k^2\) if \(k=k'\) and is zero otherwise.
Kernel exercises: Let \(k_1\) and \(k_2\) be two positive definite kernel functions such that \(k_j(y,y') = \phi_j(y)^\top \phi_j(y')\), \(\phi_j:\mathcal Y \rightarrow \mathbb R^{m_j}\), \(j\in \{1,2\}\).
1. Show that sums of kernels are kernels, that is, \(a_1k_1 + a_2 k_2\) is a positive definite kernel function if \(a_1,a_2\) are nonnegative.
2. Show that products of kernels are kernels, that is, \(k_{12} = k_1 \times k_2\) is a positive definite kernel function.
The dataset pendigits.rds is an 8 x 2 x 1000 matrix, corresponding to 1000 data points, each data point being an 8 x 2 matrix. The matrix represents, in time order, pen locations on a writing tablet while someone was writing a numerical digit. For example, the following code loads the data and plots one observation of a written 8:

Y<-readRDS("pendigits.rds")
plot(Y[,,12],type="l") 
dimnames(Y)[[3]][12]

Compute the first eight linear principal components and plot them, for example plotting the 1st versus 2nd in one figure, 3rd versus 4th in a second figure, etc. Use colors and plotting characters to identify which digits correspond to which points on your plots. Also plot the cumulative fraction of variance explained as a function of the number of components.
Now obtain the first eight kernel principal components using the kernel function \(k(x,y)= exp(-\gamma || x-y||^2)\) and make the same plots as in a. Do this for each value of \(\gamma\in \{ 10^{-5},10^{-4}, 10^{-3} \}\). Compare to each other and to the linear principal components. Discuss the use of kernel PCA for dimension reduction for these data.
Can you think of some other kernel function that might be more appropriate for these data?

Consider again the kernel \(k(x,y)= exp(-\gamma || x-y||^2)\) from the previous problem. Compare mathematically the linear PCs to the kernel PCs obtained from \(CKC\), in the limit as \(\gamma \rightarrow 0\).