【Machine Learning】Unsupervised Learning

本笔记基于清华大学《机器学习》的课程讲义无监督学习相关部分,基本为笔者在考试前一两天所作的Cheat Sheet。内容较多,并不详细,主要作为复习和记忆的资料。

Principle Component Analysis

  • Dimension reductio: JL lemma d = Ω ( log ⁡ n ϵ 2 ) d=\Omega\left(\frac{\log n}{\epsilon^2}\right) d=Ω(ϵ2logn) to remain the distance of n n n data points.
  • Goal of PCA
    • maximize variance: E [ ( v ⊤ x ) 2 ] = v ⊤ X X ⊤ v \mathbb{E}[(v^\top x)^2]=v^\top XX^\top v E[(v⊤x)2]=v⊤XX⊤v for ∣ v ∣ = 1 |v|=1 ∣v∣=1
    • minimize reconstruction error: E [ ∣ x − ( v ⊤ x ) v ∣ 2 ] \mathbb{E}[|x-(v^\top x)v|^2] E[∣x−(v⊤x)v∣2]
  • Find v i v_i vi iteratively, project data points onto subspace expanded by v 1 , v 2 , . . , v d v_1,v_2,..,v_d v1,v2,..,vd
  • How to find v v v ?
    • Eigen decomposition: X X ⊤ = U Σ U ⊤ XX^\top =U\Sigma U^\top XX⊤=UΣU⊤
    • v 1 v_1 v1 is the eigenvector of maximum eigenvalue.
    • Power method

Nearest Neighbor Classification

  • KNN: K-nearest neighbor
  • nearest neighbor search: Locality sensitive hashing algorithm(LSH)*
    • Randomized c c c-approximate R R R-near neighbor( ( c , R ) (c,R) (c,R)-NN): A data structure that at least gives a c R cR cR neighbor in some probability if R R R neighbor exists.
    • A family H H H is called ( R , c R , P 1 , P 2 ) (R,cR,P_1,P_2) (R,cR,P1,P2)-sensitive if for any p , q ∈ R d p,q\in \mathbb{R}^d p,q∈Rd
      • if ∣ p − q ∣ ≤ R |p-q|\le R ∣p−q∣≤R, then Pr ⁡ H [ h ( q ) = h ( p ) ] ≥ P 1 \Pr_H[h(q)=h(p)]\ge P_1 PrH[h(q)=h(p)]≥P1
      • if ∣ p − q ∣ ≥ c R |p-q|\ge cR ∣p−q∣≥cR, then Pr ⁡ H [ h ( q ) = h ( p ) ] ≤ P 1 \Pr_H[h(q)=h(p)]\le P_1 PrH[h(q)=h(p)]≤P1
      • P 1 > P 2 P_1>P_2 P1>P2
    • Algroithm based on LSH family:
      • Construct g i ( x ) = ( h i , 1 ( x ) , h i , 2 ( x ) , . . . , h i , k ( x ) ) , 1 ≤ i ≤ L g_i(x)=(h_{i,1}(x),h_{i,2}(x),...,h_{i,k}(x)),1\le i\le L gi(x)=(hi,1(x),hi,2(x),...,hi,k(x)),1≤i≤L. All h i , j h_{i,j} hi,j are iid from H H H.
      • Check the element in the bucket of g i ( q ) g_i(q) gi(q), whether it's c R cR cR-near neighbor of q q q. Until we check 2 L + 1 2L+1 2L+1 times.
      • if R R R neighbor exists, w.p. at least 1 2 − 1 e \frac{1}{2}-\frac{1}{e} 21−e1 find c R cR cR-neighbor
      • ρ = log ⁡ 1 / P 1 log ⁡ 1 / P 2 , k = log ⁡ 1 / P 2 ( n ) , L = n ρ \rho=\frac{\log 1/P_1}{\log 1/P_2},k=\log_{1/P_2}(n),L=n^\rho ρ=log1/P2log1/P1,k=log1/P2(n),L=nρ
      • Proof

Metric Learning

  • project x i x_i xi into f ( x i ) f(x_i) f(xi)
  • Hard version(compare label of its neighbor)- soft version
  • Neighborhood Component Analysis(NCA)
    • p i , j ∼ exp ⁡ ( − ∥ f ( x i ) − f ( x j ) ∥ 2 ) p_{i,j}\sim \exp(-\|f(x_i)-f(x_j)\|^2) pi,j∼exp(−∥f(xi)−f(xj)∥2)
    • maximize ∑ i ∑ j ∈ C i p i , j \sum_{i}\sum_{j\in C_i}p_{i,j} ∑i∑j∈Cipi,j
  • LMNN: L = max ⁡ ( 0 , ∥ f ( x ) − f ( x + ) ∥ 2 − ∥ f ( x ) − f ( x − ) ∥ 2 + r ) L=\max(0,\|f(x)-f(x^+)\|_2-\|f(x)-f(x^-)\|_2+r) L=max(0,∥f(x)−f(x+)∥2−∥f(x)−f(x−)∥2+r)
    • x + , x − x^+,x^- x+,x− are worst cases.
    • r is margin

Spectral Cluster

  • K-means
  • Spectral graph clustring
    • Graph laplacian: L = D − A L=D-A L=D−A, A A A represents the similarity.
    • #zero eigenvalue = # connected component
    • Smallest k k k eigenvectors gives a partition of k k k clusters, do k k k-means on the row
    • Ratio cut can be transfered into finding the k k k smallest eigenvectors, which is the same as graph laplacian.

SimCLR*

  • Intelligence is positioning

  • InfoNCE loss
    L ( q , p 1 , { p i } i = 2 N ) = − log ⁡ exp ⁡ ( − ∥ f ( q ) − f ( p 1 ) ∣ 2 / ( 2 τ ) ∑ i = 1 N exp ⁡ ( − ∥ f ( q ) − f ( p i ) ∣ 2 / ( 2 τ ) L(q,p_1,\{p_i\}{i=2}^N)=-\log \frac{\exp(-\|f(q)-f(p_1)|^2/(2\tau)}{\sum{i=1}^{N}\exp(-\|f(q)-f(p_{i})|^2/(2\tau)} L(q,p1,{pi}i=2N)=−log∑i=1Nexp(−∥f(q)−f(pi)∣2/(2τ)exp(−∥f(q)−f(p1)∣2/(2τ)

  • Learn Z = f ( x ) Z=f(x) Z=f(x): map original data points into a space that semantic similarity is captured naturally.

    • Reproducing kernel Hilbert space: k ( f ( x 1 ) , f ( x 2 ) ) = ⟨ ϕ ( f ( x 1 ) ) , ϕ ( f ( x 2 ) ) ⟩ H k(f(x_1),f(x_2))=\langle\phi(f(x_1)),\phi(f(x_2))\rangle_H k(f(x1),f(x2))=⟨ϕ(f(x1)),ϕ(f(x2))⟩H. Inner product is a kernel function.
    • Usually, K Z , i , j = k ( Z i − Z j ) K_{Z,i,j}=k(Z_i-Z_j) KZ,i,j=k(Zi−Zj), k k k is gaussian.
  • We have a similarity matrix π \pi π about the dataset previously. π i , j \pi_{i,j} πi,j is the similarity of data i i i and j j j. We want the similarity matrix K Z K_Z KZ of f ( x ) f(x) f(x) is the same as that of x x x which is given manually. Let W X ∼ π , W Z ∼ K Z W_X\sim \pi,W_Z\sim K_Z WX∼π,WZ∼KZ, we want these two samples are the same.

  • Minimize crossentropy loss: H π k ( Z ) = − E W X ∼ P ( ⋅ ; π ) [ log ⁡ P ( W Z = W X ; K Z ) ] H_{\pi}^{k}(Z)=-\mathbb{E}_{W_X\sim P(\cdot ;\pi)}[\log P(W_Z=W_X;K_Z)] Hπk(Z)=−EWX∼P(⋅;π)[logP(WZ=WX;KZ)]

    • Equivalent to InfoNCE loss: Only care about row i i i, infoNCE loss is log ⁡ ( W Z , i = W X , i ) \log(W_{Z,i}=W_{X,i}) log(WZ,i=WX,i). The given pair q , p 1 q,p_1 q,p1 are sampled from similarity matrix π \pi π, which corresponds to W X ∼ P ( ⋅ ; π ) W_X\sim P(\cdot;\pi) WX∼P(⋅;π).
    • Equivalent to spectral clustering: equaivalent to arg ⁡ min ⁡ Z t r ( Z ⊤ L ∗ Z ) \arg \min_Ztr(Z^\top L^*Z) argminZtr(Z⊤L∗Z)

t-SNE

  • data visualization: map data into low dimension space(2D)

  • SNE: Same as NCA, want q i , j ∼ exp ⁡ ( − ∥ f ( x i ) − f ( x j ) ∥ 2 / ( 2 σ 2 ) ) q_{i,j}\sim \exp(-\|f(x_i)-f(x_j)\|^2/(2\sigma^2)) qi,j∼exp(−∥f(xi)−f(xj)∥2/(2σ2)) to be similar to p i , j ∼ exp ⁡ ( − ∥ x i − x j ∥ 2 / ( 2 σ i 2 ) ) p_{i,j}\sim \exp (-\|x_i-x_j\|^2/(2\sigma_i^2)) pi,j∼exp(−∥xi−xj∥2/(2σi2))

    • CrossEntropy loss − p i , j ⋅ log ⁡ q i , j p i , j -p_{i,j}\cdot \log \frac{q_{i,j}}{p_{i,j}} −pi,j⋅logpi,jqi,j
  • Crowding problem

  • Solved by t-SNE: let q i , j ∼ ( 1 + ∥ y j − y i ∥ 2 ) − 1 q_{i,j}\sim (1+\|y_j-y_i\|^2)^{-1} qi,j∼(1+∥yj−yi∥2)−1(student t-distribution)

    • The power − 1 -1 −1 is more heavy tail than Gaussian, then we can solve the crowding problem by shifting the distance.
相关推荐
杰说新技术3 分钟前
Meta AI最新推出的长视频语言理解多模态模型LongVU分享
人工智能·aigc
说私域6 分钟前
基于开源 AI 智能名片、S2B2C 商城小程序的用户获取成本优化分析
人工智能·小程序
东胜物联26 分钟前
探寻5G工业网关市场,5G工业网关品牌解析
人工智能·嵌入式硬件·5g
皓74136 分钟前
服饰电商行业知识管理的创新实践与知识中台的重要性
大数据·人工智能·科技·数据分析·零售
985小水博一枚呀1 小时前
【深度学习滑坡制图|论文解读3】基于融合CNN-Transformer网络和深度迁移学习的遥感影像滑坡制图方法
人工智能·深度学习·神经网络·cnn·transformer
AltmanChan1 小时前
大语言模型安全威胁
人工智能·安全·语言模型
985小水博一枚呀1 小时前
【深度学习滑坡制图|论文解读2】基于融合CNN-Transformer网络和深度迁移学习的遥感影像滑坡制图方法
人工智能·深度学习·神经网络·cnn·transformer·迁移学习
数据与后端架构提升之路1 小时前
从神经元到神经网络:深度学习的进化之旅
人工智能·神经网络·学习
爱技术的小伙子2 小时前
【ChatGPT】如何通过逐步提示提高ChatGPT的细节描写
人工智能·chatgpt
深度学习实战训练营3 小时前
基于CNN-RNN的影像报告生成
人工智能·深度学习