Loading learning content...
Principal Component Analysis (PCA) and Factor Analysis (FA) are the two most widely used linear dimensionality reduction techniques. They often produce similar results, are frequently confused with each other, and are sometimes incorrectly treated as interchangeable. Yet their conceptual foundations, mathematical formulations, and appropriate use cases differ fundamentally.
Understanding these differences is not merely academic pedantry. The choice between PCA and FA reflects different beliefs about your data, different inferential goals, and different assumptions about measurement. A researcher who uses PCA when FA is conceptually appropriate—or vice versa—may draw incorrect conclusions or miss important insights.
This page provides a comprehensive comparison: we'll examine the philosophical and mathematical differences, analyze when each method is appropriate, explore cases where they converge and diverge, and provide concrete guidance for practice.
By the end of this page, you will understand: • The fundamentally different conceptual models underlying PCA and FA • Mathematical differences in formulation and estimation • When results are similar and when they diverge dramatically • Practical decision criteria for choosing between the methods • Common misunderstandings and how to avoid them • How to interpret results differently for each method
The most fundamental difference between PCA and FA lies in their causal direction—a conceptual distinction that determines everything else.
PCA treats observed variables as causes of components. The principal components are weighted combinations of the observed data:
$$\text{PC}j = w{1j}x_1 + w_{2j}x_2 + \cdots + w_{pj}x_p$$
There is no model of how the data were generated. PCA simply finds orthogonal axes that maximize variance in the observed data. Components are defined by the data; they don't precede it.
The arrow of "explanation": Data → Components
PCA says: "Given these observed variables, how can I summarize them with fewer variables that capture the most information?"
Factor Analysis treats latent factors as causes of observed variables. The observed data are generated by the factors:
$$x_i = \lambda_{i1}z_1 + \lambda_{i2}z_2 + \cdots + \lambda_{ik}z_k + \epsilon_i$$
Factors are prior to observations; they exist (at least conceptually) whether or not we measure them. Observed variables are imperfect reflections of underlying constructs.
The arrow of explanation: Factors → Data
FA says: "What underlying constructs could have generated these correlations among observed variables?"
In FA, the causal language is taken seriously in fields like psychology: intelligence (a latent factor) causes performance on IQ test items; personality traits cause behavior patterns. Whether factors are "really" causal is debatable, but the model structure assumes this direction. PCA makes no such causal claims.
Consider the relationship between wealth and its indicators (income, savings, property value, investment portfolio):
PCA perspective: "I'll combine these four wealth indicators into a single 'wealth component' that captures most of the variance. The component is defined by the weighted sum of these specific indicators."
FA perspective: "There exists an underlying latent variable 'wealth' that we cannot directly observe. The four indicators are imperfect measurements of this wealth, each contaminated by measurement error and specific factors."
The FA perspective posits a real underlying thing; the PCA perspective merely summarizes what we measured.
Let's examine the precise mathematical differences between these methods.
PCA decomposes the covariance matrix Σ into eigenvalues and eigenvectors:
$$\boldsymbol{\Sigma} = \mathbf{V D V}^\top$$
where:
Key property: PCA perfectly reconstructs Σ when using all p components: $$\boldsymbol{\Sigma} = \sum_{j=1}^{p} d_j \mathbf{v}_j \mathbf{v}_j^\top$$
No error term exists: All variance is attributed to components.
Factor analysis models Σ as:
$$\boldsymbol{\Sigma} = \mathbf{\Lambda \Lambda}^\top + \mathbf{\Psi}$$
where:
Key property: FA only explains the off-diagonal elements of Σ through common factors. The diagonal (variances) is split between common and unique variance.
This is the crux of the difference: PCA explains all variance (diagonal and off-diagonal) with components. FA explains off-diagonal covariances with common factors but allows diagonal elements to have unique variance. Mathematically: PCA puts the full diagonal of Σ; FA puts communalities on the diagonal.
In classical exploratory FA, we often work with a reduced correlation matrix: the sample correlation matrix with communalities on the diagonal instead of 1s.
Let R be the correlation matrix and R* = R - diag(ψ) + diag(h²) be the reduced matrix:
This is why early FA implementations asked users to estimate initial communalities—they were needed to form R*.
PCA:
FA:
| Aspect | PCA | Factor Analysis |
|---|---|---|
| Core equation | Σ = VDV' | Σ = ΛΛ' + Ψ |
| Error term | None (exact decomposition) | Uniquenesses Ψ (diagonal) |
| Variance explained | All variance | Common variance only |
| Number of components/factors | Up to p | k << p (must specify) |
| Solution method | Eigendecomposition (closed-form) | Iterative optimization |
| Diagonal of Σ (variances) | Fully explained by components | h² (communality) + ψ |
| Rotation | Optional, usually not needed | Essential for interpretation |
| Uniqueness | Not modeled | Modeled explicitly |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192
import numpy as npfrom sklearn.decomposition import PCA, FactorAnalysisfrom scipy import linalg def compare_pca_fa(X, n_components=2): """ Compare PCA and Factor Analysis on the same data. Demonstrates their different approaches to variance explanation. """ # Center the data X_centered = X - X.mean(axis=0) # Sample covariance matrix n = X.shape[0] Sigma = (X_centered.T @ X_centered) / (n - 1) # ===================== # PCA # ===================== pca = PCA(n_components=n_components) pca.fit(X) # PCA loadings: eigenvectors scaled by sqrt(eigenvalues) pca_loadings = pca.components_.T * np.sqrt(pca.explained_variance_) # PCA reconstructed covariance (from k components) Sigma_pca = pca_loadings @ pca_loadings.T # ===================== # Factor Analysis # ===================== fa = FactorAnalysis(n_components=n_components, rotation=None) fa.fit(X) fa_loadings = fa.components_.T # p x k loading matrix fa_noise = fa.noise_variance_ # uniquenesses (diagonal of Psi) # FA reconstructed covariance Sigma_fa = fa_loadings @ fa_loadings.T + np.diag(fa_noise) # ===================== # Comparison # ===================== print("Sample Covariance Matrix:") print(np.round(Sigma, 3)) print("--- PCA ---") print("Loadings (eigenvectors * sqrt(eigenvalues)):") print(np.round(pca_loadings, 3)) print("Explained variance ratios:", pca.explained_variance_ratio_) print("Reconstructed covariance (LL'):") print(np.round(Sigma_pca, 3)) print("Reconstruction error (Frobenius norm):", np.linalg.norm(Sigma - Sigma_pca, 'fro')) print("--- Factor Analysis ---") print("Loadings (Lambda):") print(np.round(fa_loadings, 3)) print("Uniqunesses (psi):", np.round(fa_noise, 3)) print("Communalities (h^2):", np.round(np.sum(fa_loadings**2, axis=1), 3)) print("Reconstructed covariance (LL' + Psi):") print(np.round(Sigma_fa, 3)) print("Reconstruction error (Frobenius norm):", np.linalg.norm(Sigma - Sigma_fa, 'fro')) # Key insight: FA allows residual variance on diagonal print("--- Key Difference: Diagonal Elements ---") print("True diagonal (variances):", np.round(np.diag(Sigma), 3)) print("PCA diagonal from LL':", np.round(np.diag(Sigma_pca), 3)) print("FA communalities (from LL'):", np.round(np.diag(fa_loadings @ fa_loadings.T), 3)) print("FA uniquenesses:", np.round(fa_noise, 3)) print("FA total (h^2 + psi):", np.round(np.diag(fa_loadings @ fa_loadings.T) + fa_noise, 3)) # Generate example data with 2 underlying factorsnp.random.seed(42)n_samples = 500z = np.random.randn(n_samples, 2) # 2 latent factorsLambda_true = np.array([ [0.9, 0.1], [0.8, 0.2], [0.7, 0.3], # Load on factor 1 [0.1, 0.9], [0.2, 0.8], [0.3, 0.7], # Load on factor 2])noise = np.random.randn(n_samples, 6) * 0.3X = z @ Lambda_true.T + noise compare_pca_fa(X, n_components=2)Despite their different foundations, PCA and FA often produce remarkably similar results. Understanding when and why this happens helps clarify when the distinction matters.
The primary driver of PCA-FA convergence is high communality. When most variance is common variance:
$$h_i^2 \approx 1 \text{ for all } i \quad \Rightarrow \quad \psi_i \approx 0$$
Then the FA covariance model ΛΛ' + Ψ ≈ ΛΛ', which is similar to PCA's low-rank approximation.
Intuition: When measurement error is negligible and all variables share most of their variance, there's little unique variance to distinguish FA from PCA. Both methods extract essentially the same structure.
Consider the limiting case. If Ψ → 0 (no unique variance), the FA model becomes: $$\boldsymbol{\Sigma} = \mathbf{\Lambda \Lambda}^\top$$
This is a low-rank approximation—exactly what PCA's k-component solution provides! In this limit, FA loadings converge to scaled PCA eigenvectors.
Research has identified conditions where PCA and FA results are nearly interchangeable:
When average communality exceeds 0.7, the practical distinction between PCA and FA diminishes substantially. Many methodologists suggest: if you're primarily interested in data reduction and communalities are high, PCA is simpler and nearly equivalent. If communalities are modest (< 0.6), FA offers meaningful advantages.
After extracting k components or factors, rotation can make PCA and FA solutions more similar:
However: The rotated solutions are still conceptually different:
The numerical similarity masks conceptual differences that matter for interpretation.
| Condition | More Similar | More Different |
|---|---|---|
| Communalities | High (> 0.7) | Low (< 0.5) |
| Sample size | Large (n > 500) | Small (n < 100) |
| Variables per factor | Many (> 5) | Few (< 3) |
| Loading pattern | Simple structure | Cross-loadings |
| Number of factors | Few (k = 2-3) | Many (k > 5) |
The conditions where PCA and FA diverge are precisely the conditions where the choice between them matters most.
When substantial unique variance exists (ψᵢ large), the methods diverge significantly:
PCA behavior: Treats all variance as signal. Variables with high unique variance receive weight in components even though much of their variance doesn't covary with other variables.
FA behavior: Downweights variables with high uniqueness. Factor loadings reflect only common variance, so noisy variables have lower loadings.
Consequence: PCA may assign high loadings to noisy variables; FA correctly identifies them as poor factor markers.
Consider a scale with 5 good items (communality 0.7) and 1 poorly worded item (communality 0.2). PCA will include the noisy item in components because it has variance. FA will correctly estimate low loadings for this item, revealing its poor measurement quality. This is where FA's error model provides crucial information.
PCA loadings are systematically higher than FA loadings when unique variance is present. This is because:
$$\text{PCA loading} \approx \sqrt{h_i^2 + \psi_i} \cdot \text{(eigenvector element)}$$
$$\text{FA loading} \approx \sqrt{h_i^2} \cdot \text{(corresponding coefficient)}$$
PCA incorporates the full variance; FA extracts only common variance. This inflation can mislead interpretation:
| Variable | True communality | PCA loading | FA loading |
|---|---|---|---|
| Item 1 | 0.80 | 0.89 | 0.72 |
| Item 2 | 0.70 | 0.84 | 0.65 |
| Item 3 | 0.30 | 0.72 | 0.38 |
| Item 4 | 0.20 | 0.68 | 0.31 |
The PCA loadings for Items 3-4 appear substantial (0.68-0.72) despite low communality. FA correctly shows these items are weak markers (0.31-0.38).
Even with similar loading patterns, factor scores differ:
PCA scores: Exact, computed as X·V (data times eigenvectors) FA factor scores: Estimated (not directly observed), typically via regression method
FA scores account for measurement error; PCA scores treat all variance as signal. With correlated variables and measurement error, the difference in score accuracy can be substantial.
When data has genuine unique variance:
With the technical differences established, let's synthesize practical guidance for method selection.
1. What is your goal?
If your goal is data reduction for downstream use (feeding into regression, clustering, visualization): → PCA is often appropriate. You want maximum variance compressed into few dimensions, regardless of where variance comes from.
If your goal is understanding latent structure (what constructs underlie observed correlations): → FA is conceptually appropriate. You're positing the existence of unobserved causes.
If your goal is scale development or psychometric analysis: → FA is the standard. You need to distinguish reliable from unreliable items, model measurement error, and validate constructs.
2. Do you believe in latent causes?
Does it make sense that your observed variables are caused by underlying latent variables?
3. Is measurement error a concern?
If you believe substantial measurement error exists and want to model it: → FA explicitly estimates uniquenesses → PCA ignores error, treating all variance as signal
In practice, if communalities are high and you're mainly interested in summarizing variance, both methods give similar results. In such cases, default to PCA for simplicity. Reserve FA for situations where the measurement model matters: scale construction, psychological research, structural equation modeling contexts.
Different fields have established norms:
| Domain | Traditional Method | Rationale |
|---|---|---|
| Psychology/Psychometrics | Factor Analysis | Measurement model; latent traits |
| Genomics/Bioinformatics | PCA | High dimensions; data reduction focus |
| Marketing Research | Factor Analysis | Latent attitudes, preferences |
| Image Processing | PCA (via SVD) | No latent cause interpretation |
| Social Sciences | Both (context-dependent) | FA for constructs, PCA for indices |
| Finance | PCA | Market factors as summaries |
| Neuroimaging | PCA/ICA | Dimension reduction, blind source separation |
These norms reflect both substantive considerations and historical practice. However, blindly following field norms without considering your specific goals is inadvisable.
The PCA vs FA distinction is often misunderstood, even by experienced researchers. Let's address prevalent misconceptions.
Reality: They are distinct methods with different models. PCA is not FA with a constrained covariance structure. The relationship is more nuanced:
Reality: Even after rotation, the loadings differ because:
You can get numerically similar results, but they represent different quantities.
Reality: Sophistication doesn't equal appropriateness. FA makes stronger assumptions:
If these assumptions are untenable or irrelevant to your goal, PCA's simplicity is an advantage.
Proper terminology matters: • PCA produces components (deterministic summaries of data) • FA produces factors (latent variables causing data)
Calling PCA results "factors" or FA results "components" is technically incorrect and can cause confusion. Some software (e.g., SPSS) contributes to confusion by offering "Principal Component Factoring" as an extraction method.
Reality: The eigenvalue > 1 (Kaiser criterion) applies to correlation matrix PCA. For FA:
Reality: Uniquenesses combine two sources:
Without external reliability estimates, these cannot be separated. FA estimates total unique variance but cannot decompose it.
Reality: While FA naturally extends to hierarchical models (second-order factors), you can also:
FA's framework is more natural for higher-order models, but it's not exclusive.
The choice between PCA and FA affects how you interpret and report results. Here's guidance for each method.
Components are data-defined:
No existence claims:
Variance explained is total variance:
Factors are posited constructs:
Loadings reflect common variance only:
Uniquenesses are informative:
Best practice: State explicitly whether you used PCA or FA, why you made that choice, and how the interpretation should be understood given that choice. Reviewers and readers will appreciate the clarity, and it demonstrates methodological awareness.
PCA scores (exact): $$\mathbf{t} = \mathbf{X V}$$
Scores are deterministic linear combinations of the data. No estimation uncertainty.
FA factor scores (estimated): Common methods:
FA scores are shrunk estimates; they account for reliability. With moderate communalities, FA scores have different properties than PCA scores:
For downstream use:
We've comprehensively compared these two fundamental dimensionality reduction techniques. Let's consolidate the key insights:
Having established when and why to use factor analysis, the next page explores Factor Rotation—the techniques used to transform factor solutions into interpretable forms. We'll cover orthogonal rotations (varimax, quartimax), oblique rotations (promax, oblimin), and the crucial concept of "simple structure" that guides rotation choices.
You now have a principled framework for choosing between PCA and Factor Analysis. Remember: the choice reflects beliefs about your data and your analytical goals. Neither method is universally superior—context determines appropriateness.