Data Science — Linear Algebra

Singular
Value
Decomp.

A = U × Σ × Vᵀ — breaking matrices into beautiful pieces

A = U × Σ × Vᵀ
Explore Below
Section 01

What is SVD?

SVD stands for Singular Value Decomposition. It's a mathematical technique that takes any matrix and breaks it into three simpler, structured matrices. Think of it as factoring a matrix — just like how 12 = 3 × 4, any matrix can be written as U × Σ × Vᵀ.

🎯
The Goal
Decompose any m×n matrix A into three matrices that reveal the hidden structure, importance, and geometry of the data.
Matrix Factorization
💡
The Analogy
Like breaking a complex recipe into: ingredients (V), how much of each (Σ), and how to combine them (U). You can recreate the dish — or simplify it.
Intuition
Why It Matters
SVD is used everywhere: image compression, Netflix recommendations, NLP, noise removal. It's one of the most powerful tools in data science.
Applications
Section 02

The Three Matrices

Every SVD breaks matrix A into exactly three components. Each has a precise geometric meaning.

A
3
1
1
3
2 × 2
=
U
.707
.707
.707
-.707
Left Singular Vectors
×
Σ
4
0
0
2
Singular Values
×
Vᵀ
.707
.707
.707
-.707
Right Singular Vectors
U
Left Singular Vectors
Size: m × m
Columns are orthonormal (perpendicular unit vectors).

Meaning: Captures the "row perspective" — how rows of A relate to each other. Think of it as a rotation/reflection applied to the rows.
Σ
Singular Values (Sigma)
Size: m × n (diagonal)
Values: σ₁ ≥ σ₂ ≥ ... ≥ 0 (always non-negative, sorted)

Meaning: The "importance ranking." Larger value = more influential direction. Zero values = redundant dimensions.
Vᵀ
Right Singular Vectors
Size: n × n
Rows of Vᵀ (= columns of V) are orthonormal.

Meaning: Captures the "column perspective" — how columns of A relate. A rotation in the input space.
Section 03

Step-by-Step Computation

We'll compute SVD on this 2×2 matrix by hand. Every step explained clearly.

A  =  [ 3   1 ]
       [ 1   3 ]
1
Compute AᵀA
Multiply A-transpose by A. This gives a square, symmetric matrix — ideal for finding eigenvalues.
# A-transpose × A Aᵀ = [ 3 1 ] A = [ 3 1 ] [ 1 3 ] [ 1 3 ] AᵀA = [ (3×3)+(1×1) (3×1)+(1×3) ] [ (1×3)+(3×1) (1×1)+(3×3) ] AᵀA = [ 10 6 ] [ 6 10 ]
2
Find Eigenvalues → λ
Solve det(AᵀA − λI) = 0. Eigenvalues tell us how much "stretch" each direction has.
# Characteristic equation det([ 10-λ 6 ]) = 0 ([ 6 10-λ ]) (10-λ)² − 36 = 0 λ² − 20λ + 64 = 0 (λ − 16)(λ − 4) = 0 λ₁ = 16 λ₂ = 4 # ← eigenvalues!
3
Compute Singular Values → Σ
Singular values are simply square roots of eigenvalues. They fill the diagonal of Σ, sorted largest → smallest.
σ₁ = √λ₁ = √16 = 4.0 σ₂ = √λ₂ = √4 = 2.0 Σ = [ 4.0 0 ] [ 0 2.0 ]
4
Find Eigenvectors → V
For each eigenvalue, solve (AᵀA − λI)v = 0 to find eigenvectors. These become columns of V.
# For λ₁ = 16: (AᵀA − 16I)v = 0 [ -6 6 ] [v₁] [0] [ 6 -6 ] [v₂] = [0] → v₁ = v₂ → v₁ = [0.707, 0.707] # For λ₂ = 4: (AᵀA − 4I)v = 0 [ 6 6 ] [v₁] [0] [ 6 6 ] [v₂] = [0] → v₁ = −v₂ → v₂ = [0.707, -0.707] # Therefore: V = [ 0.707 0.707 ] [ 0.707 -0.707 ] Vᵀ = [ 0.707 0.707 ] [ 0.707 -0.707 ]
5
Compute U (Left Singular Vectors)
Use the formula: uᵢ = (1/σᵢ) × A × vᵢ
# u₁: using σ₁=4, v₁=[0.707, 0.707] u₁ = (1/4) × A × v₁ = (1/4) × [ 3×0.707 + 1×0.707 ] [ 1×0.707 + 3×0.707 ] = (1/4) × [2.828, 2.828] = [0.707, 0.707] # u₂: using σ₂=2, v₂=[0.707, -0.707] u₂ = (1/2) × A × v₂ = (1/2) × [1.414, -1.414] = [0.707, -0.707] U = [ 0.707 0.707 ] [ 0.707 -0.707 ]
Final Result
Combining all three matrices:
A  = U × Σ × Vᵀ
[ 3 1 ]  =  [ 0.707 0.707 ] × [ 4 0 ] × [ 0.707 0.707 ]
[ 1 3 ]      [ 0.707 -0.707 ]    [ 0 2 ]     [ 0.707 -0.707 ]
Section 04

Try It Live

Enter any 2×2 matrix and watch SVD computed in real time.

Interactive SVD Calculator
Edit the matrix values and click Decompose
MATRIX A
Section 05

Truncated SVD

The most powerful trick in SVD: keep only the top-k singular values and throw the rest away. You get an approximation that uses far less memory — but preserves most of the information.

Aₖ = U[:, :k] × Σ[:k, :k] × Vᵀ[:k, :]
Low-rank approximation using only the top k singular values
Singular Values as "Importance Bars" — hover to see values
Variance Explained Calculator
See how much information each component captures
Section 06

Cool Applications of SVD

SVD is one of the most versatile tools in all of science and engineering. Here are 12 real, buildable applications — from AI art to finance to space exploration.

🖼️ Computer Vision
Image Compression
Decompose a grayscale image matrix with SVD. Keep only the top-k singular values and reconstruct a near-identical image at a fraction of the storage. JPEG-style compression from scratch.
Compression Simulator
k = k=20
numpy.linalg.svd(image)
🧑 Computer Vision
Eigenfaces — Face Recognition
Stack 1000 face images as rows of a matrix. SVD extracts "eigenfaces" — the principal face directions. Any new face is represented as a combination of these eigenfaces. The backbone of early facial recognition systems (AT&T, MIT).
1 Build matrix: each row = flattened face image
2 Center data, apply SVD → get U, Σ, Vᵀ
3 Top-k rows of Vᵀ = the "eigenfaces"
4 Project new face → nearest neighbor in eigenspace
sklearn.decomposition.PCA on face matrix
🎥 Computer Vision
Video Background Removal
Stack video frames as columns of a matrix. SVD separates the static background (low-rank component, captured by large σ values) from moving foreground objects (sparse component). Used in surveillance, self-driving cars, and video compression.
1 Matrix A = [frame₁ | frame₂ | ... | frameₙ]
2 Low-rank part (k=1~5) = static background
3 Residual = moving objects / foreground
Robust PCA / RPCA algorithm
🎬 Machine Learning
Netflix-Style Recommender
Build a user×movie ratings matrix (mostly empty). SVD fills the gaps by finding latent factors — hidden taste dimensions like "action lover", "indie fan". Predict any missing rating as the dot product of user and movie latent vectors.
Ratings Matrix Demo (? = unknown)
surprise library / matrix factorization
📚 NLP & Text
Latent Semantic Analysis (LSA)
Build a term×document matrix. SVD finds semantic "topics" hiding beneath word co-occurrence patterns. Synonyms cluster together. "Car" and "automobile" become neighbours even if they never appear in the same doc. Powers search engines and document clustering.
1 TF-IDF matrix: rows=words, cols=documents
2 Apply TruncatedSVD with k=100~300 topics
3 Query → project into topic space → cosine similarity
gensim.models.LsiModel
🚨 Machine Learning
Anomaly & Fraud Detection
Normal transactions form a low-rank structure. Fraudulent ones don't fit the pattern. Reconstruct data with truncated SVD — rows with high reconstruction error are anomalies. Used in credit card fraud detection, network intrusion, and quality control.
Reconstruction Error Concept
Normal transaction
error: 0.02
Fraudulent ⚠️
error: 0.87
||A - Aₖ||² as anomaly score
🧠 Science
MRI & Medical Image Denoising
MRI scans are noisy matrices. Biological signal sits in large singular values; noise hides in the small ones. Truncated SVD reconstructs a clean image. Used in hospitals to reduce scan time (noisier but faster scans → SVD cleans them up).
1 Raw MRI = signal + Gaussian noise matrix
2 SVD: large σ = anatomy, small σ = noise
3 Keep top-k → clean anatomical image
4 SNR improvement without extra scan time
scipy.linalg.svd on image patches
📈 Finance
Stock Market Factor Analysis
Build a matrix of daily stock returns (rows=stocks, cols=days). SVD extracts hidden market factors — the first singular vector often corresponds to the overall market trend. Used in quantitative finance for risk decomposition and portfolio optimization.
Factor Contribution (example)
numpy SVD on returns matrix
🤖 NLP & Text
Word Embeddings (pre-Word2Vec)
SVD on a word co-occurrence matrix produces dense vector representations for every word. Words that appear in similar contexts get similar vectors. "King − Man + Woman ≈ Queen" style analogies were first discovered this way. This was the precursor to Word2Vec and GloVe.
1 Count matrix: C[i,j] = times word i near word j
2 Apply PPMI weighting → smooth the matrix
3 TruncatedSVD(k=300) → 300-dim word vectors
4 Cosine similarity finds semantic neighbors
TruncatedSVD on PPMI co-occurrence matrix
🛰️ Science
GPS & Sensor Calibration
GPS receivers get signals from multiple satellites — modeled as an overdetermined linear system. SVD computes the pseudoinverse to find the best-fit position. Also used to calibrate IMU sensors (accelerometers, gyroscopes) in phones, drones, and spacecraft.
1 System: Ax = b (more equations than unknowns)
2 SVD pseudoinverse: x = V × Σ⁺ × Uᵀ × b
3 Minimizes total positioning error (least squares)
numpy.linalg.lstsq (uses SVD internally)
🧬 Science
Genomics — Gene Expression Analysis
A gene expression matrix has thousands of genes × hundreds of patients. SVD (as PCA) reduces it to 2-3 dimensions for visualization — revealing clusters of patients with similar cancer subtypes, or genes that behave similarly across conditions.
1 Matrix: rows=genes (20k), cols=patients (500)
2 Normalize → TruncatedSVD(k=50)
3 Plot first 2 components → cancer subtype clusters
4 Top singular vectors = "metagenes"
scanpy / sklearn PCA on scRNA-seq data
🎨 Machine Learning
LoRA — Fine-tuning AI Models
LoRA (Low-Rank Adaptation) — the technique behind fine-tuning Stable Diffusion, LLaMA, and GPT — is literally SVD. Large weight matrices are approximated as low-rank products (A = U × Vᵀ). This reduces fine-tuning a 7B parameter model from 28GB to ~50MB of new weights.
1 Full weight update ΔW is large (expensive)
2 Approximate: ΔW ≈ A × B (low-rank, rank=4~64)
3 Train only A and B → ~1000× fewer parameters
4 This IS truncated SVD applied to neural nets
huggingface/peft — LoRA implementation
Section 07

Python Code

Ready-to-run examples. Copy, paste, execute.

Basic SVD with NumPy

python import numpy as np # Define your matrix A = np.array([[3, 1], [1, 3]]) # SVD — one line! U, sigma, Vt = np.linalg.svd(A) print("U (Left Singular Vectors):") print(U.round(3)) print("\nSingular Values:", sigma.round(3)) print("\nVt (Right Vectors Transposed):") print(Vt.round(3)) # Reconstruct A from U, Σ, Vt Sigma = np.diag(sigma) A_back = U @ Sigma @ Vt print("\nReconstructed A:") print(A_back.round(3)) # should match original

Truncated SVD with scikit-learn

python from sklearn.decomposition import TruncatedSVD import numpy as np # Simulate a document-term matrix (100 docs, 50 words) np.random.seed(42) A = np.random.rand(100, 50) # Keep only top 5 components svd = TruncatedSVD(n_components=5, random_state=42) A_reduced = svd.fit_transform(A) print(f"Original shape: {A.shape}") # (100, 50) print(f"Compressed shape: {A_reduced.shape}") # (100, 5) variance = svd.explained_variance_ratio_ print(f"Total variance explained: {sum(variance)*100:.1f}%")

Image Compression

python import numpy as np import matplotlib.pyplot as plt # Load image as grayscale matrix img = plt.imread('photo.png')[..., 0] # shape: (H, W) # Decompose U, sigma, Vt = np.linalg.svd(img, full_matrices=False) # Reconstruct at different compression levels def compress(U, sigma, Vt, k): return U[:, :k] @ np.diag(sigma[:k]) @ Vt[:k, :] for k in [1, 5, 20, 50, 100]: approx = compress(U, sigma, Vt, k) pct = sum(sigma[:k]**2) / sum(sigma**2) * 100 print(f"k={k:3d}: {pct:.1f}% quality retained")

Quick Reference

ConceptDescription
A = U×Σ×VᵀAny matrix A decomposes into left vectors, singular values, right vectors
Um×m orthonormal — row-space directions
Σm×n diagonal — singular values σ₁ ≥ σ₂ ≥ ... ≥ 0, sorted by importance
Vᵀn×n orthonormal — column-space directions
σᵢ = √λᵢSingular values are square roots of eigenvalues of AᵀA
Truncated SVDKeep top-k singular values for compression and noise removal
Explained %σᵢ² / Σσ² × 100 — variance captured by each component
SVD vs PCAPCA = SVD on mean-centered data. Same math, different framing