Machine LearningDimensionality Reduction

Factor Analysis

LevelAdvanced

Duration90 mins

TopicDimensionality Reduction

2 / 5

Factor Analysis vs PCA

Two Paths to Dimensionality Reduction

Principal Component Analysis (PCA) and Factor Analysis (FA) are the two most widely used linear dimensionality reduction techniques. They often produce similar results, are frequently confused with each other, and are sometimes incorrectly treated as interchangeable. Yet their conceptual foundations, mathematical formulations, and appropriate use cases differ fundamentally.

Understanding these differences is not merely academic pedantry. The choice between PCA and FA reflects different beliefs about your data, different inferential goals, and different assumptions about measurement. A researcher who uses PCA when FA is conceptually appropriate—or vice versa—may draw incorrect conclusions or miss important insights.

This page provides a comprehensive comparison: we'll examine the philosophical and mathematical differences, analyze when each method is appropriate, explore cases where they converge and diverge, and provide concrete guidance for practice.

What You Will Learn

By the end of this page, you will understand: • The fundamentally different conceptual models underlying PCA and FA • Mathematical differences in formulation and estimation • When results are similar and when they diverge dramatically • Practical decision criteria for choosing between the methods • Common misunderstandings and how to avoid them • How to interpret results differently for each method

Conceptual Foundations: Causes vs. Components

The most fundamental difference between PCA and FA lies in their causal direction—a conceptual distinction that determines everything else.

PCA: Components as Summaries

PCA treats observed variables as causes of components. The principal components are weighted combinations of the observed data:

$$\text{PC}j = w{1j}x_1 + w_{2j}x_2 + \cdots + w_{pj}x_p$$

There is no model of how the data were generated. PCA simply finds orthogonal axes that maximize variance in the observed data. Components are defined by the data; they don't precede it.

The arrow of "explanation": Data → Components

PCA says: "Given these observed variables, how can I summarize them with fewer variables that capture the most information?"

FA: Factors as Causes

Factor Analysis treats latent factors as causes of observed variables. The observed data are generated by the factors:

$$x_i = \lambda_{i1}z_1 + \lambda_{i2}z_2 + \cdots + \lambda_{ik}z_k + \epsilon_i$$

Factors are prior to observations; they exist (at least conceptually) whether or not we measure them. Observed variables are imperfect reflections of underlying constructs.

The arrow of explanation: Factors → Data

FA says: "What underlying constructs could have generated these correlations among observed variables?"

The Causal Interpretation

In FA, the causal language is taken seriously in fields like psychology: intelligence (a latent factor) causes performance on IQ test items; personality traits cause behavior patterns. Whether factors are "really" causal is debatable, but the model structure assumes this direction. PCA makes no such causal claims.

PCA Conceptual Commitments

•Descriptive: Summarizes existing data
•No error model: All variance is signal
•Components defined by data: No existence claims
•Goal: Maximum variance extraction
•Exact solution: No estimation needed for components

FA Conceptual Commitments

•Generative: Models data generation process
•Explicit error model: Uniquenesses capture noise
•Factors precede data: Existence claims possible
•Goal: Explain correlations via latent causes
•Estimated solution: Parameters inferred from data

A Telling Analogy

Consider the relationship between wealth and its indicators (income, savings, property value, investment portfolio):

PCA perspective: "I'll combine these four wealth indicators into a single 'wealth component' that captures most of the variance. The component is defined by the weighted sum of these specific indicators."

FA perspective: "There exists an underlying latent variable 'wealth' that we cannot directly observe. The four indicators are imperfect measurements of this wealth, each contaminated by measurement error and specific factors."

The FA perspective posits a real underlying thing; the PCA perspective merely summarizes what we measured.

Mathematical Formulation Comparison

Let's examine the precise mathematical differences between these methods.

PCA: Eigendecomposition of Covariance

PCA decomposes the covariance matrix Σ into eigenvalues and eigenvectors:

$$\boldsymbol{\Sigma} = \mathbf{V D V}^\top$$

where:

V is the matrix of eigenvectors (principal component directions)
D = diag(d₁, d₂, ..., dₚ) contains eigenvalues (variances along each PC)

Key property: PCA perfectly reconstructs Σ when using all p components: $$\boldsymbol{\Sigma} = \sum_{j=1}^{p} d_j \mathbf{v}_j \mathbf{v}_j^\top$$

No error term exists: All variance is attributed to components.

FA: Constrained Covariance Decomposition

Factor analysis models Σ as:

$$\boldsymbol{\Sigma} = \mathbf{\Lambda \Lambda}^\top + \mathbf{\Psi}$$

where:

Λ is the p × k loading matrix (k < p)
Ψ = diag(ψ₁, ..., ψₚ) is the diagonal uniqueness matrix

Key property: FA only explains the off-diagonal elements of Σ through common factors. The diagonal (variances) is split between common and unique variance.

The Diagonal Matters

This is the crux of the difference: PCA explains all variance (diagonal and off-diagonal) with components. FA explains off-diagonal covariances with common factors but allows diagonal elements to have unique variance. Mathematically: PCA puts the full diagonal of Σ; FA puts communalities on the diagonal.

The Communality Reduction

In classical exploratory FA, we often work with a reduced correlation matrix: the sample correlation matrix with communalities on the diagonal instead of 1s.

Let R be the correlation matrix and R* = R - diag(ψ) + diag(h²) be the reduced matrix:

FA on R* extracts factors that explain only shared variance
PCA on R (with 1s on diagonal) extracts components explaining all variance

This is why early FA implementations asked users to estimate initial communalities—they were needed to form R*.

Estimation Methods

PCA:

Solved exactly via eigendecomposition (SVD)
No iterative estimation needed
Deterministic given the data

FA:

Requires iterative estimation (likelihood optimization)
Multiple estimation methods exist:
- Maximum Likelihood (ML)
- Principal Axis Factoring (PAF)
- Weighted Least Squares (WLS)
- Minimum residual (MINRES)
Solution depends on estimation method and may not converge

Mathematical Formulations Compared
Aspect	PCA	Factor Analysis
Core equation	Σ = VDV'	Σ = ΛΛ' + Ψ
Error term	None (exact decomposition)	Uniquenesses Ψ (diagonal)
Variance explained	All variance	Common variance only
Number of components/factors	Up to p	k << p (must specify)
Solution method	Eigendecomposition (closed-form)	Iterative optimization
Diagonal of Σ (variances)	Fully explained by components	h² (communality) + ψ
Rotation	Optional, usually not needed	Essential for interpretation
Uniqueness	Not modeled	Modeled explicitly

pca_vs_fa_comparison.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
import numpy as np
from sklearn.decomposition import PCA, FactorAnalysis
from scipy import linalg
 
def compare_pca_fa(X, n_components=2):
    """
    Compare PCA and Factor Analysis on the same data.
    Demonstrates their different approaches to variance explanation.
    """
    # Center the data
    X_centered = X - X.mean(axis=0)
    
    # Sample covariance matrix
    n = X.shape[0]
    Sigma = (X_centered.T @ X_centered) / (n - 1)
    
    # =====================
    # PCA
    # =====================
    pca = PCA(n_components=n_components)
    pca.fit(X)
    
    # PCA loadings: eigenvectors scaled by sqrt(eigenvalues)
    pca_loadings = pca.components_.T * np.sqrt(pca.explained_variance_)
    
    # PCA reconstructed covariance (from k components)
    Sigma_pca = pca_loadings @ pca_loadings.T
    
    # =====================
    # Factor Analysis
    # =====================
    fa = FactorAnalysis(n_components=n_components, rotation=None)
    fa.fit(X)
    
    fa_loadings = fa.components_.T  # p x k loading matrix
    fa_noise = fa.noise_variance_   # uniquenesses (diagonal of Psi)
    
    # FA reconstructed covariance
    Sigma_fa = fa_loadings @ fa_loadings.T + np.diag(fa_noise)
    
    # =====================
    # Comparison
    # =====================
    print("Sample Covariance Matrix:")
    print(np.round(Sigma, 3))
    
    print("
--- PCA ---")
    print("Loadings (eigenvectors * sqrt(eigenvalues)):")
    print(np.round(pca_loadings, 3))
    print("
Explained variance ratios:", pca.explained_variance_ratio_)
    print("
Reconstructed covariance (LL'):")
    print(np.round(Sigma_pca, 3))
    print("Reconstruction error (Frobenius norm):", 
          np.linalg.norm(Sigma - Sigma_pca, 'fro'))
    
    print("
--- Factor Analysis ---")
    print("Loadings (Lambda):")
    print(np.round(fa_loadings, 3))
    print("
Uniqunesses (psi):", np.round(fa_noise, 3))
    print("Communalities (h^2):", np.round(np.sum(fa_loadings**2, axis=1), 3))
    print("
Reconstructed covariance (LL' + Psi):")
    print(np.round(Sigma_fa, 3))
    print("Reconstruction error (Frobenius norm):", 
          np.linalg.norm(Sigma - Sigma_fa, 'fro'))
    
    # Key insight: FA allows residual variance on diagonal
    print("
--- Key Difference: Diagonal Elements ---")
    print("True diagonal (variances):", np.round(np.diag(Sigma), 3))
    print("PCA diagonal from LL':", np.round(np.diag(Sigma_pca), 3))
    print("FA communalities (from LL'):", np.round(np.diag(fa_loadings @ fa_loadings.T), 3))
    print("FA uniquenesses:", np.round(fa_noise, 3))
    print("FA total (h^2 + psi):", np.round(np.diag(fa_loadings @ fa_loadings.T) + fa_noise, 3))
 
# Generate example data with 2 underlying factors
np.random.seed(42)
n_samples = 500
z = np.random.randn(n_samples, 2)  # 2 latent factors
Lambda_true = np.array([
    [0.9, 0.1], [0.8, 0.2], [0.7, 0.3],  # Load on factor 1
    [0.1, 0.9], [0.2, 0.8], [0.3, 0.7],  # Load on factor 2
])
noise = np.random.randn(n_samples, 6) * 0.3
X = z @ Lambda_true.T + noise
 
compare_pca_fa(X, n_components=2)

When PCA and FA Give Similar Results

Despite their different foundations, PCA and FA often produce remarkably similar results. Understanding when and why this happens helps clarify when the distinction matters.

High Communality Conditions

The primary driver of PCA-FA convergence is high communality. When most variance is common variance:

$$h_i^2 \approx 1 \text{ for all } i \quad \Rightarrow \quad \psi_i \approx 0$$

Then the FA covariance model ΛΛ' + Ψ ≈ ΛΛ', which is similar to PCA's low-rank approximation.

Intuition: When measurement error is negligible and all variables share most of their variance, there's little unique variance to distinguish FA from PCA. Both methods extract essentially the same structure.

Mathematical Convergence

Consider the limiting case. If Ψ → 0 (no unique variance), the FA model becomes: $$\boldsymbol{\Sigma} = \mathbf{\Lambda \Lambda}^\top$$

This is a low-rank approximation—exactly what PCA's k-component solution provides! In this limit, FA loadings converge to scaled PCA eigenvectors.

Empirical Conditions for Similarity

Research has identified conditions where PCA and FA results are nearly interchangeable:

Conditions Favoring PCA-FA Convergence

•High communalities: Mean h² > 0.7 for all variables
•Well-defined factors: Each factor has several strong markers (loadings > 0.6)
•Large sample size: n > 500 stabilizes both solutions
•More variables per factor: p/k > 5 improves convergence
•Simple structure: Clear clustering of variables on factors

The 0.7 Rule

When average communality exceeds 0.7, the practical distinction between PCA and FA diminishes substantially. Many methodologists suggest: if you're primarily interested in data reduction and communalities are high, PCA is simpler and nearly equivalent. If communalities are modest (< 0.6), FA offers meaningful advantages.

When Rotation Bridges the Gap

After extracting k components or factors, rotation can make PCA and FA solutions more similar:

Both PCA and FA yield initial solutions that maximize some criterion
After rotation (e.g., varimax), both aim for simple structure
Rotated solutions often look more similar than unrotated ones

However: The rotated solutions are still conceptually different:

Rotated PCA: Linear combinations of all original variance
Rotated FA: Loadings for common variance only

The numerical similarity masks conceptual differences that matter for interpretation.

Factors Affecting PCA-FA Similarity
Condition	More Similar	More Different
Communalities	High (> 0.7)	Low (< 0.5)
Sample size	Large (n > 500)	Small (n < 100)
Variables per factor	Many (> 5)	Few (< 3)
Loading pattern	Simple structure	Cross-loadings
Number of factors	Few (k = 2-3)	Many (k > 5)

When PCA and FA Give Different Results

The conditions where PCA and FA diverge are precisely the conditions where the choice between them matters most.

Low Communality Scenarios

When substantial unique variance exists (ψᵢ large), the methods diverge significantly:

PCA behavior: Treats all variance as signal. Variables with high unique variance receive weight in components even though much of their variance doesn't covary with other variables.

FA behavior: Downweights variables with high uniqueness. Factor loadings reflect only common variance, so noisy variables have lower loadings.

Consequence: PCA may assign high loadings to noisy variables; FA correctly identifies them as poor factor markers.

The Noisy Variable Problem

Consider a scale with 5 good items (communality 0.7) and 1 poorly worded item (communality 0.2). PCA will include the noisy item in components because it has variance. FA will correctly estimate low loadings for this item, revealing its poor measurement quality. This is where FA's error model provides crucial information.

Inflated PCA Loadings

PCA loadings are systematically higher than FA loadings when unique variance is present. This is because:

$$\text{PCA loading} \approx \sqrt{h_i^2 + \psi_i} \cdot \text{(eigenvector element)}$$

$$\text{FA loading} \approx \sqrt{h_i^2} \cdot \text{(corresponding coefficient)}$$

PCA incorporates the full variance; FA extracts only common variance. This inflation can mislead interpretation:

Variable	True communality	PCA loading	FA loading
Item 1	0.80	0.89	0.72
Item 2	0.70	0.84	0.65
Item 3	0.30	0.72	0.38
Item 4	0.20	0.68	0.31

The PCA loadings for Items 3-4 appear substantial (0.68-0.72) despite low communality. FA correctly shows these items are weak markers (0.31-0.38).

Divergence in Factor Score Estimation

Even with similar loading patterns, factor scores differ:

PCA scores: Exact, computed as X·V (data times eigenvectors) FA factor scores: Estimated (not directly observed), typically via regression method

FA scores account for measurement error; PCA scores treat all variance as signal. With correlated variables and measurement error, the difference in score accuracy can be substantial.

Structural Implications

When data has genuine unique variance:

PCA underfactors: May extract fewer components because some variance is spread across multiple components
FA may require more factors: To explain the same amount of variance (after removing uniqueness)
Goodness-of-fit differs: FA provides formal fit tests (χ², RMSEA); PCA has no equivalent

PCA Overestimates When:

•Variables have substantial measurement error
•Some variables measure unique constructs
•Quality of indicators varies widely
•Reliability of measures is unknown

FA Provides Insight When:

•You need to identify poorly measured items
•Understanding common vs unique variance matters
•Scale development/validation is the goal
•Error-adjusted factor scores are needed

Decision Framework: Choosing Between PCA and FA

With the technical differences established, let's synthesize practical guidance for method selection.

Primary Decision Criteria

1. What is your goal?

If your goal is data reduction for downstream use (feeding into regression, clustering, visualization): → PCA is often appropriate. You want maximum variance compressed into few dimensions, regardless of where variance comes from.

If your goal is understanding latent structure (what constructs underlie observed correlations): → FA is conceptually appropriate. You're positing the existence of unobserved causes.

If your goal is scale development or psychometric analysis: → FA is the standard. You need to distinguish reliable from unreliable items, model measurement error, and validate constructs.

2. Do you believe in latent causes?

Does it make sense that your observed variables are caused by underlying latent variables?

Personality items reflecting traits: Yes → FA
Gene expression levels: Unclear, possibly PCA
Image pixel intensities: Usually No → PCA
Customer satisfaction ratings: Probably Yes → FA

3. Is measurement error a concern?

If you believe substantial measurement error exists and want to model it: → FA explicitly estimates uniquenesses → PCA ignores error, treating all variance as signal

Quick Decision Heuristics

•Preprocessing for ML algorithms: Use PCA. Simplicity and maximum variance extraction.
•Psychometric research: Use FA. Distinguishing common from unique variance is central.
•Exploratory analysis of unfamiliar data: Start with PCA (simpler), consider FA if factor structure is goal.
•Confirmatory analysis with hypothesized structure: Use CFA (confirmatory FA).
•Very high-dimensional data (genes, pixels): PCA. FA requires p(p-1)/2 covariance estimates.
•Image compression: PCA (or truncated SVD). No latent factor interpretation needed.
•Survey/questionnaire development: FA. Understanding what items measure is the goal.

When Both Are Acceptable

In practice, if communalities are high and you're mainly interested in summarizing variance, both methods give similar results. In such cases, default to PCA for simplicity. Reserve FA for situations where the measurement model matters: scale construction, psychological research, structural equation modeling contexts.

Domain-Specific Standards

Different fields have established norms:

Domain	Traditional Method	Rationale
Psychology/Psychometrics	Factor Analysis	Measurement model; latent traits
Genomics/Bioinformatics	PCA	High dimensions; data reduction focus
Marketing Research	Factor Analysis	Latent attitudes, preferences
Image Processing	PCA (via SVD)	No latent cause interpretation
Social Sciences	Both (context-dependent)	FA for constructs, PCA for indices
Finance	PCA	Market factors as summaries
Neuroimaging	PCA/ICA	Dimension reduction, blind source separation

These norms reflect both substantive considerations and historical practice. However, blindly following field norms without considering your specific goals is inadvisable.

Common Misunderstandings Clarified

The PCA vs FA distinction is often misunderstood, even by experienced researchers. Let's address prevalent misconceptions.

Misunderstanding 1: "PCA is a special case of FA"

Reality: They are distinct methods with different models. PCA is not FA with a constrained covariance structure. The relationship is more nuanced:

If Ψ → 0, FA resembles PCA mathematically
But the estimation and conceptual frameworks remain different
PPCA (Probabilistic PCA) is a latent variable model related to FA, but classical PCA is not

Misunderstanding 2: "FA loadings = PCA loadings after rotation"

Reality: Even after rotation, the loadings differ because:

PCA loadings reflect total variance (common + unique)
FA loadings reflect common variance only
Rotated PCA loadings are still linear combinations of all variance

You can get numerically similar results, but they represent different quantities.

Misunderstanding 3: "Use FA because it's more sophisticated"

Reality: Sophistication doesn't equal appropriateness. FA makes stronger assumptions:

Correct number of factors specified
Factors cause observed variables
Uncorrelated uniquenesses
Linearity and normality

If these assumptions are untenable or irrelevant to your goal, PCA's simplicity is an advantage.

The 'Component' vs 'Factor' Language

Proper terminology matters: • PCA produces components (deterministic summaries of data) • FA produces factors (latent variables causing data)

Calling PCA results "factors" or FA results "components" is technically incorrect and can cause confusion. Some software (e.g., SPSS) contributes to confusion by offering "Principal Component Factoring" as an extraction method.

Misunderstanding 4: "Always extract factors until eigenvalues < 1"

Reality: The eigenvalue > 1 (Kaiser criterion) applies to correlation matrix PCA. For FA:

Eigenvalues of the reduced correlation matrix (with communalities) are relevant
Parallel analysis or scree tests are more appropriate
Kaiser's rule tends to overextract factors

Misunderstanding 5: "FA uniquenesses = measurement error"

Reality: Uniquenesses combine two sources:

Specific variance: True variance unique to the variable
Error variance: Random measurement error

Without external reliability estimates, these cannot be separated. FA estimates total unique variance but cannot decompose it.

Misunderstanding 6: "Higher-order factors are only possible with FA"

Reality: While FA naturally extends to hierarchical models (second-order factors), you can also:

Apply PCA to first-order PCA scores (though interpretation is tricky)
Use multilevel PCA approaches

FA's framework is more natural for higher-order models, but it's not exclusive.

Mistakes to Avoid

•Using PCA when you need to test a measurement model: PCA provides no fit statistics; FA does.
•Using FA when you just need dimension reduction: FA's extra machinery is unnecessary overhead.
•Interpreting unrotated PCA components as factors: Rotation is usually needed for interpretability.
•Assuming FA is 'always better': FA's advantages depend on assumptions being reasonable.
•Ignoring estimation method in FA: ML, PAF, and MINRES can give different results.
•Reporting only one method without justification: State why you chose PCA or FA.

Practical Implications for Interpretation

The choice between PCA and FA affects how you interpret and report results. Here's guidance for each method.

Interpreting PCA Results

Components are data-defined:

"The first principal component captures 45% of total variance and loads highly on variables X, Y, Z"
"The first PC is a weighted combination of the original variables"

No existence claims:

Avoid: "Component 1 represents 'anxiety'" (implies a real construct)
Better: "Component 1 is characterized by high loadings on anxiety-related items"

Variance explained is total variance:

The 70% explained by 3 components includes all variance, common and unique

Interpreting FA Results

Factors are posited constructs:

"Factor 1, interpreted as 'anxiety', explains shared variance among stress-related items"
Language suggests factors exist independently of measurement

Loadings reflect common variance only:

A loading of 0.7 means 49% of common variance, not total variance
Total variance explained = communality + uniqueness

Uniquenesses are informative:

High uniqueness = poor marker of common factors = candidate for removal
Low uniqueness = well-explained by common factors = good item

Reporting PCA Results

•Report eigenvalues and variance explained
•Describe rotation (if used)
•Present component loadings matrix
•Explain component interpretations cautiously
•Report total variance explained

Reporting FA Results

•Report estimation method (ML, PAF, etc.)
•Report rotation method and rationale
•Present loading matrix and communalities
•Report fit indices (χ², RMSEA, CFI)
•Discuss uniquenesses if relevant

Transparency in Reporting

Best practice: State explicitly whether you used PCA or FA, why you made that choice, and how the interpretation should be understood given that choice. Reviewers and readers will appreciate the clarity, and it demonstrates methodological awareness.

Score Computation Differences

PCA scores (exact): $$\mathbf{t} = \mathbf{X V}$$

Scores are deterministic linear combinations of the data. No estimation uncertainty.

FA factor scores (estimated): Common methods:

Regression (Thomson): $\hat{\mathbf{z}} = \mathbf{\Lambda}' \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})$
Bartlett: Maximum likelihood estimate given the individual's data

FA scores are shrunk estimates; they account for reliability. With moderate communalities, FA scores have different properties than PCA scores:

More stable under measurement error
May correlate even when factors are orthogonal

For downstream use:

PCA scores: Use when you need exact transformations
FA scores: Use when you want error-corrected estimates of latent positions

Summary: PCA vs Factor Analysis

We've comprehensively compared these two fundamental dimensionality reduction techniques. Let's consolidate the key insights:

Key Takeaways

•Different causal directions: PCA creates components from data; FA posits factors that generate data.
•Different variance treatment: PCA explains all variance; FA distinguishes common from unique variance.
•Mathematical relationship: Σ = VDV' (PCA) vs Σ = ΛΛ' + Ψ (FA). The uniqueness matrix Ψ is the key difference.
•Convergence conditions: With high communalities (h² > 0.7), results are similar. With low communalities, they diverge.
•PCA advantages: Simplicity, no convergence issues, exact solution, appropriate for data reduction.
•FA advantages: Measurement model, error modeling, fit statistics, appropriate for construct investigation.
•Domain considerations: Psychology uses FA; bioinformatics uses PCA. Match method to your substantive questions.
•Avoid conflation: They are not interchangeable. Report which you used and why.

What's Next

Having established when and why to use factor analysis, the next page explores Factor Rotation—the techniques used to transform factor solutions into interpretable forms. We'll cover orthogonal rotations (varimax, quartimax), oblique rotations (promax, oblimin), and the crucial concept of "simple structure" that guides rotation choices.

Decision Framework Complete

You now have a principled framework for choosing between PCA and Factor Analysis. Remember: the choice reflects beliefs about your data and your analytical goals. Neither method is universally superior—context determines appropriateness.

2 / 5

Loading learning content...

Machine LearningDimensionality Reduction

Factor Analysis

LevelAdvanced

Duration90 mins

TopicDimensionality Reduction

2 / 5

Factor Analysis vs PCA

Two Paths to Dimensionality Reduction

What You Will Learn

Conceptual Foundations: Causes vs. Components

The most fundamental difference between PCA and FA lies in their causal direction—a conceptual distinction that determines everything else.

PCA: Components as Summaries

PCA treats observed variables as causes of components. The principal components are weighted combinations of the observed data:

$$\text{PC}j = w{1j}x_1 + w_{2j}x_2 + \cdots + w_{pj}x_p$$

There is no model of how the data were generated. PCA simply finds orthogonal axes that maximize variance in the observed data. Components are defined by the data; they don't precede it.

The arrow of "explanation": Data → Components

PCA says: "Given these observed variables, how can I summarize them with fewer variables that capture the most information?"

FA: Factors as Causes

Factor Analysis treats latent factors as causes of observed variables. The observed data are generated by the factors:

$$x_i = \lambda_{i1}z_1 + \lambda_{i2}z_2 + \cdots + \lambda_{ik}z_k + \epsilon_i$$

Factors are prior to observations; they exist (at least conceptually) whether or not we measure them. Observed variables are imperfect reflections of underlying constructs.

The arrow of explanation: Factors → Data

FA says: "What underlying constructs could have generated these correlations among observed variables?"

The Causal Interpretation

PCA Conceptual Commitments

•Descriptive: Summarizes existing data
•No error model: All variance is signal
•Components defined by data: No existence claims
•Goal: Maximum variance extraction
•Exact solution: No estimation needed for components

FA Conceptual Commitments

•Generative: Models data generation process
•Explicit error model: Uniquenesses capture noise
•Factors precede data: Existence claims possible
•Goal: Explain correlations via latent causes
•Estimated solution: Parameters inferred from data

A Telling Analogy

Consider the relationship between wealth and its indicators (income, savings, property value, investment portfolio):

The FA perspective posits a real underlying thing; the PCA perspective merely summarizes what we measured.

Mathematical Formulation Comparison

Let's examine the precise mathematical differences between these methods.

PCA: Eigendecomposition of Covariance

PCA decomposes the covariance matrix Σ into eigenvalues and eigenvectors:

$$\boldsymbol{\Sigma} = \mathbf{V D V}^\top$$

where:

V is the matrix of eigenvectors (principal component directions)
D = diag(d₁, d₂, ..., dₚ) contains eigenvalues (variances along each PC)

Key property: PCA perfectly reconstructs Σ when using all p components: $$\boldsymbol{\Sigma} = \sum_{j=1}^{p} d_j \mathbf{v}_j \mathbf{v}_j^\top$$

No error term exists: All variance is attributed to components.

FA: Constrained Covariance Decomposition

Factor analysis models Σ as:

$$\boldsymbol{\Sigma} = \mathbf{\Lambda \Lambda}^\top + \mathbf{\Psi}$$

where:

Λ is the p × k loading matrix (k < p)
Ψ = diag(ψ₁, ..., ψₚ) is the diagonal uniqueness matrix

Key property: FA only explains the off-diagonal elements of Σ through common factors. The diagonal (variances) is split between common and unique variance.

The Diagonal Matters

The Communality Reduction

In classical exploratory FA, we often work with a reduced correlation matrix: the sample correlation matrix with communalities on the diagonal instead of 1s.

Let R be the correlation matrix and R* = R - diag(ψ) + diag(h²) be the reduced matrix:

FA on R* extracts factors that explain only shared variance
PCA on R (with 1s on diagonal) extracts components explaining all variance

This is why early FA implementations asked users to estimate initial communalities—they were needed to form R*.

Estimation Methods

PCA:

Solved exactly via eigendecomposition (SVD)
No iterative estimation needed
Deterministic given the data

FA:

Requires iterative estimation (likelihood optimization)
Multiple estimation methods exist:
- Maximum Likelihood (ML)
- Principal Axis Factoring (PAF)
- Weighted Least Squares (WLS)
- Minimum residual (MINRES)
Solution depends on estimation method and may not converge

Mathematical Formulations Compared
Aspect	PCA	Factor Analysis
Core equation	Σ = VDV'	Σ = ΛΛ' + Ψ
Error term	None (exact decomposition)	Uniquenesses Ψ (diagonal)
Variance explained	All variance	Common variance only
Number of components/factors	Up to p	k << p (must specify)
Solution method	Eigendecomposition (closed-form)	Iterative optimization
Diagonal of Σ (variances)	Fully explained by components	h² (communality) + ψ
Rotation	Optional, usually not needed	Essential for interpretation
Uniqueness	Not modeled	Modeled explicitly

pca_vs_fa_comparison.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
import numpy as np
from sklearn.decomposition import PCA, FactorAnalysis
from scipy import linalg
 
def compare_pca_fa(X, n_components=2):
    """
    Compare PCA and Factor Analysis on the same data.
    Demonstrates their different approaches to variance explanation.
    """
    # Center the data
    X_centered = X - X.mean(axis=0)
    
    # Sample covariance matrix
    n = X.shape[0]
    Sigma = (X_centered.T @ X_centered) / (n - 1)
    
    # =====================
    # PCA
    # =====================
    pca = PCA(n_components=n_components)
    pca.fit(X)
    
    # PCA loadings: eigenvectors scaled by sqrt(eigenvalues)
    pca_loadings = pca.components_.T * np.sqrt(pca.explained_variance_)
    
    # PCA reconstructed covariance (from k components)
    Sigma_pca = pca_loadings @ pca_loadings.T
    
    # =====================
    # Factor Analysis
    # =====================
    fa = FactorAnalysis(n_components=n_components, rotation=None)
    fa.fit(X)
    
    fa_loadings = fa.components_.T  # p x k loading matrix
    fa_noise = fa.noise_variance_   # uniquenesses (diagonal of Psi)
    
    # FA reconstructed covariance
    Sigma_fa = fa_loadings @ fa_loadings.T + np.diag(fa_noise)
    
    # =====================
    # Comparison
    # =====================
    print("Sample Covariance Matrix:")
    print(np.round(Sigma, 3))
    
    print("
--- PCA ---")
    print("Loadings (eigenvectors * sqrt(eigenvalues)):")
    print(np.round(pca_loadings, 3))
    print("
Explained variance ratios:", pca.explained_variance_ratio_)
    print("
Reconstructed covariance (LL'):")
    print(np.round(Sigma_pca, 3))
    print("Reconstruction error (Frobenius norm):", 
          np.linalg.norm(Sigma - Sigma_pca, 'fro'))
    
    print("
--- Factor Analysis ---")
    print("Loadings (Lambda):")
    print(np.round(fa_loadings, 3))
    print("
Uniqunesses (psi):", np.round(fa_noise, 3))
    print("Communalities (h^2):", np.round(np.sum(fa_loadings**2, axis=1), 3))
    print("
Reconstructed covariance (LL' + Psi):")
    print(np.round(Sigma_fa, 3))
    print("Reconstruction error (Frobenius norm):", 
          np.linalg.norm(Sigma - Sigma_fa, 'fro'))
    
    # Key insight: FA allows residual variance on diagonal
    print("
--- Key Difference: Diagonal Elements ---")
    print("True diagonal (variances):", np.round(np.diag(Sigma), 3))
    print("PCA diagonal from LL':", np.round(np.diag(Sigma_pca), 3))
    print("FA communalities (from LL'):", np.round(np.diag(fa_loadings @ fa_loadings.T), 3))
    print("FA uniquenesses:", np.round(fa_noise, 3))
    print("FA total (h^2 + psi):", np.round(np.diag(fa_loadings @ fa_loadings.T) + fa_noise, 3))
 
# Generate example data with 2 underlying factors
np.random.seed(42)
n_samples = 500
z = np.random.randn(n_samples, 2)  # 2 latent factors
Lambda_true = np.array([
    [0.9, 0.1], [0.8, 0.2], [0.7, 0.3],  # Load on factor 1
    [0.1, 0.9], [0.2, 0.8], [0.3, 0.7],  # Load on factor 2
])
noise = np.random.randn(n_samples, 6) * 0.3
X = z @ Lambda_true.T + noise
 
compare_pca_fa(X, n_components=2)

When PCA and FA Give Similar Results

Despite their different foundations, PCA and FA often produce remarkably similar results. Understanding when and why this happens helps clarify when the distinction matters.

High Communality Conditions

The primary driver of PCA-FA convergence is high communality. When most variance is common variance:

$$h_i^2 \approx 1 \text{ for all } i \quad \Rightarrow \quad \psi_i \approx 0$$

Then the FA covariance model ΛΛ' + Ψ ≈ ΛΛ', which is similar to PCA's low-rank approximation.

Mathematical Convergence

Consider the limiting case. If Ψ → 0 (no unique variance), the FA model becomes: $$\boldsymbol{\Sigma} = \mathbf{\Lambda \Lambda}^\top$$

This is a low-rank approximation—exactly what PCA's k-component solution provides! In this limit, FA loadings converge to scaled PCA eigenvectors.

Empirical Conditions for Similarity

Research has identified conditions where PCA and FA results are nearly interchangeable:

Conditions Favoring PCA-FA Convergence

•High communalities: Mean h² > 0.7 for all variables
•Well-defined factors: Each factor has several strong markers (loadings > 0.6)
•Large sample size: n > 500 stabilizes both solutions
•More variables per factor: p/k > 5 improves convergence
•Simple structure: Clear clustering of variables on factors

The 0.7 Rule

When Rotation Bridges the Gap

After extracting k components or factors, rotation can make PCA and FA solutions more similar:

Both PCA and FA yield initial solutions that maximize some criterion
After rotation (e.g., varimax), both aim for simple structure
Rotated solutions often look more similar than unrotated ones

However: The rotated solutions are still conceptually different:

Rotated PCA: Linear combinations of all original variance
Rotated FA: Loadings for common variance only

The numerical similarity masks conceptual differences that matter for interpretation.

Factors Affecting PCA-FA Similarity
Condition	More Similar	More Different
Communalities	High (> 0.7)	Low (< 0.5)
Sample size	Large (n > 500)	Small (n < 100)
Variables per factor	Many (> 5)	Few (< 3)
Loading pattern	Simple structure	Cross-loadings
Number of factors	Few (k = 2-3)	Many (k > 5)

When PCA and FA Give Different Results

The conditions where PCA and FA diverge are precisely the conditions where the choice between them matters most.

Low Communality Scenarios

When substantial unique variance exists (ψᵢ large), the methods diverge significantly:

PCA behavior: Treats all variance as signal. Variables with high unique variance receive weight in components even though much of their variance doesn't covary with other variables.

FA behavior: Downweights variables with high uniqueness. Factor loadings reflect only common variance, so noisy variables have lower loadings.

Consequence: PCA may assign high loadings to noisy variables; FA correctly identifies them as poor factor markers.

The Noisy Variable Problem

Inflated PCA Loadings

PCA loadings are systematically higher than FA loadings when unique variance is present. This is because:

$$\text{PCA loading} \approx \sqrt{h_i^2 + \psi_i} \cdot \text{(eigenvector element)}$$

$$\text{FA loading} \approx \sqrt{h_i^2} \cdot \text{(corresponding coefficient)}$$

PCA incorporates the full variance; FA extracts only common variance. This inflation can mislead interpretation:

Variable	True communality	PCA loading	FA loading
Item 1	0.80	0.89	0.72
Item 2	0.70	0.84	0.65
Item 3	0.30	0.72	0.38
Item 4	0.20	0.68	0.31

The PCA loadings for Items 3-4 appear substantial (0.68-0.72) despite low communality. FA correctly shows these items are weak markers (0.31-0.38).

Divergence in Factor Score Estimation

Even with similar loading patterns, factor scores differ:

PCA scores: Exact, computed as X·V (data times eigenvectors) FA factor scores: Estimated (not directly observed), typically via regression method

FA scores account for measurement error; PCA scores treat all variance as signal. With correlated variables and measurement error, the difference in score accuracy can be substantial.

Structural Implications

When data has genuine unique variance:

PCA underfactors: May extract fewer components because some variance is spread across multiple components
FA may require more factors: To explain the same amount of variance (after removing uniqueness)
Goodness-of-fit differs: FA provides formal fit tests (χ², RMSEA); PCA has no equivalent

PCA Overestimates When:

•Variables have substantial measurement error
•Some variables measure unique constructs
•Quality of indicators varies widely
•Reliability of measures is unknown

FA Provides Insight When:

•You need to identify poorly measured items
•Understanding common vs unique variance matters
•Scale development/validation is the goal
•Error-adjusted factor scores are needed

Decision Framework: Choosing Between PCA and FA

With the technical differences established, let's synthesize practical guidance for method selection.

Primary Decision Criteria

1. What is your goal?

If your goal is understanding latent structure (what constructs underlie observed correlations): → FA is conceptually appropriate. You're positing the existence of unobserved causes.

If your goal is scale development or psychometric analysis: → FA is the standard. You need to distinguish reliable from unreliable items, model measurement error, and validate constructs.

2. Do you believe in latent causes?

Does it make sense that your observed variables are caused by underlying latent variables?

Personality items reflecting traits: Yes → FA
Gene expression levels: Unclear, possibly PCA
Image pixel intensities: Usually No → PCA
Customer satisfaction ratings: Probably Yes → FA

3. Is measurement error a concern?

If you believe substantial measurement error exists and want to model it: → FA explicitly estimates uniquenesses → PCA ignores error, treating all variance as signal

Quick Decision Heuristics

•Preprocessing for ML algorithms: Use PCA. Simplicity and maximum variance extraction.
•Psychometric research: Use FA. Distinguishing common from unique variance is central.
•Exploratory analysis of unfamiliar data: Start with PCA (simpler), consider FA if factor structure is goal.
•Confirmatory analysis with hypothesized structure: Use CFA (confirmatory FA).
•Very high-dimensional data (genes, pixels): PCA. FA requires p(p-1)/2 covariance estimates.
•Image compression: PCA (or truncated SVD). No latent factor interpretation needed.
•Survey/questionnaire development: FA. Understanding what items measure is the goal.

When Both Are Acceptable

Domain-Specific Standards

Different fields have established norms:

Domain	Traditional Method	Rationale
Psychology/Psychometrics	Factor Analysis	Measurement model; latent traits
Genomics/Bioinformatics	PCA	High dimensions; data reduction focus
Marketing Research	Factor Analysis	Latent attitudes, preferences
Image Processing	PCA (via SVD)	No latent cause interpretation
Social Sciences	Both (context-dependent)	FA for constructs, PCA for indices
Finance	PCA	Market factors as summaries
Neuroimaging	PCA/ICA	Dimension reduction, blind source separation

These norms reflect both substantive considerations and historical practice. However, blindly following field norms without considering your specific goals is inadvisable.

Common Misunderstandings Clarified

The PCA vs FA distinction is often misunderstood, even by experienced researchers. Let's address prevalent misconceptions.

Misunderstanding 1: "PCA is a special case of FA"

Reality: They are distinct methods with different models. PCA is not FA with a constrained covariance structure. The relationship is more nuanced:

If Ψ → 0, FA resembles PCA mathematically
But the estimation and conceptual frameworks remain different
PPCA (Probabilistic PCA) is a latent variable model related to FA, but classical PCA is not

Misunderstanding 2: "FA loadings = PCA loadings after rotation"

Reality: Even after rotation, the loadings differ because:

PCA loadings reflect total variance (common + unique)
FA loadings reflect common variance only
Rotated PCA loadings are still linear combinations of all variance

You can get numerically similar results, but they represent different quantities.

Misunderstanding 3: "Use FA because it's more sophisticated"

Reality: Sophistication doesn't equal appropriateness. FA makes stronger assumptions:

Correct number of factors specified
Factors cause observed variables
Uncorrelated uniquenesses
Linearity and normality

If these assumptions are untenable or irrelevant to your goal, PCA's simplicity is an advantage.

The 'Component' vs 'Factor' Language

Proper terminology matters: • PCA produces components (deterministic summaries of data) • FA produces factors (latent variables causing data)

Misunderstanding 4: "Always extract factors until eigenvalues < 1"

Reality: The eigenvalue > 1 (Kaiser criterion) applies to correlation matrix PCA. For FA:

Eigenvalues of the reduced correlation matrix (with communalities) are relevant
Parallel analysis or scree tests are more appropriate
Kaiser's rule tends to overextract factors

Misunderstanding 5: "FA uniquenesses = measurement error"

Reality: Uniquenesses combine two sources:

Specific variance: True variance unique to the variable
Error variance: Random measurement error

Without external reliability estimates, these cannot be separated. FA estimates total unique variance but cannot decompose it.

Misunderstanding 6: "Higher-order factors are only possible with FA"

Reality: While FA naturally extends to hierarchical models (second-order factors), you can also:

Apply PCA to first-order PCA scores (though interpretation is tricky)
Use multilevel PCA approaches

FA's framework is more natural for higher-order models, but it's not exclusive.

Mistakes to Avoid

•Using PCA when you need to test a measurement model: PCA provides no fit statistics; FA does.
•Using FA when you just need dimension reduction: FA's extra machinery is unnecessary overhead.
•Interpreting unrotated PCA components as factors: Rotation is usually needed for interpretability.
•Assuming FA is 'always better': FA's advantages depend on assumptions being reasonable.
•Ignoring estimation method in FA: ML, PAF, and MINRES can give different results.
•Reporting only one method without justification: State why you chose PCA or FA.

Practical Implications for Interpretation

The choice between PCA and FA affects how you interpret and report results. Here's guidance for each method.

Interpreting PCA Results

Components are data-defined:

"The first principal component captures 45% of total variance and loads highly on variables X, Y, Z"
"The first PC is a weighted combination of the original variables"

No existence claims:

Avoid: "Component 1 represents 'anxiety'" (implies a real construct)
Better: "Component 1 is characterized by high loadings on anxiety-related items"

Variance explained is total variance:

The 70% explained by 3 components includes all variance, common and unique

Interpreting FA Results

Factors are posited constructs:

"Factor 1, interpreted as 'anxiety', explains shared variance among stress-related items"
Language suggests factors exist independently of measurement

Loadings reflect common variance only:

A loading of 0.7 means 49% of common variance, not total variance
Total variance explained = communality + uniqueness

Uniquenesses are informative:

High uniqueness = poor marker of common factors = candidate for removal
Low uniqueness = well-explained by common factors = good item

Reporting PCA Results

•Report eigenvalues and variance explained
•Describe rotation (if used)
•Present component loadings matrix
•Explain component interpretations cautiously
•Report total variance explained

Reporting FA Results

•Report estimation method (ML, PAF, etc.)
•Report rotation method and rationale
•Present loading matrix and communalities
•Report fit indices (χ², RMSEA, CFI)
•Discuss uniquenesses if relevant

Transparency in Reporting

Score Computation Differences

PCA scores (exact): $$\mathbf{t} = \mathbf{X V}$$

Scores are deterministic linear combinations of the data. No estimation uncertainty.

FA factor scores (estimated): Common methods:

Regression (Thomson): $\hat{\mathbf{z}} = \mathbf{\Lambda}' \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})$
Bartlett: Maximum likelihood estimate given the individual's data

FA scores are shrunk estimates; they account for reliability. With moderate communalities, FA scores have different properties than PCA scores:

More stable under measurement error
May correlate even when factors are orthogonal

For downstream use:

PCA scores: Use when you need exact transformations
FA scores: Use when you want error-corrected estimates of latent positions

Summary: PCA vs Factor Analysis

We've comprehensively compared these two fundamental dimensionality reduction techniques. Let's consolidate the key insights:

Key Takeaways

•Different causal directions: PCA creates components from data; FA posits factors that generate data.
•Different variance treatment: PCA explains all variance; FA distinguishes common from unique variance.
•Mathematical relationship: Σ = VDV' (PCA) vs Σ = ΛΛ' + Ψ (FA). The uniqueness matrix Ψ is the key difference.
•Convergence conditions: With high communalities (h² > 0.7), results are similar. With low communalities, they diverge.
•PCA advantages: Simplicity, no convergence issues, exact solution, appropriate for data reduction.
•FA advantages: Measurement model, error modeling, fit statistics, appropriate for construct investigation.
•Domain considerations: Psychology uses FA; bioinformatics uses PCA. Match method to your substantive questions.
•Avoid conflation: They are not interchangeable. Report which you used and why.

What's Next

Decision Framework Complete

2 / 5