Loading content...
Imagine you're a psychologist analyzing the results of a comprehensive personality test with 50 different questions. Each person's responses form a 50-dimensional data point. Yet intuitively, you suspect that these 50 observable responses don't represent 50 independent aspects of personality—instead, they're likely manifestations of a smaller number of underlying personality traits like extroversion, neuroticism, and conscientiousness.
This fundamental insight—that high-dimensional observable data is often generated by a smaller number of hidden (latent) factors—lies at the heart of factor analysis (FA). Factor analysis is not merely a dimensionality reduction technique; it's a generative probabilistic model that explicitly models how latent factors combine to produce observed data.
The distinction between factor analysis and techniques like Principal Component Analysis (PCA) is profound. While PCA finds directions of maximum variance without any generative model, factor analysis posits that observed variables are linear combinations of latent factors plus unique noise—a causal hypothesis about data generation that enables deeper interpretation and principled statistical inference.
By the end of this page, you will understand: • The generative model of factor analysis and its mathematical specification • The meaning and interpretation of factor loadings and uniquenesses • The fundamental assumptions of factor analysis (linearity, Gaussianity, independence) • The identifiability problem and rotational indeterminacy • When factor analysis is appropriate versus when PCA suffices • The philosophical distinction between "common" and "unique" variance
Factor analysis begins with a bold hypothesis: the observable data we measure is not fundamental, but rather generated by a smaller set of unobservable latent factors. This generative perspective distinguishes factor analysis from purely descriptive techniques and places it within the broader family of latent variable models.
Let x be a p-dimensional observed random vector (e.g., responses to p questionnaire items). Factor analysis posits that x is generated according to:
$$\mathbf{x} = \boldsymbol{\mu} + \mathbf{\Lambda} \mathbf{z} + \boldsymbol{\epsilon}$$
where:
This equation encodes the fundamental assumption: each observed variable is a linear combination of common factors (Λz) plus variable-specific noise (ε).
The generative perspective imagines data creation as a process: first, nature samples latent factor values z, then these factors "cause" observed values x through the linear transformation Λz, with each observed variable additionally receiving unique noise ε. This is a causal model, not just a statistical summary.
For classical factor analysis, we impose Gaussian assumptions:
Latent factors: $$\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_k)$$
The factors are assumed to have:
Unique factors (noise): $$\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Psi})$$
where Ψ = diag(ψ₁, ψ₂, ..., ψₚ) is a diagonal covariance matrix. The diagonal structure encodes the crucial assumption that unique factors are mutually uncorrelated.
Independence assumption: $$\mathbf{z} \perp \boldsymbol{\epsilon}$$
The latent factors and unique factors are assumed independent.
| Component | Notation | Dimension | Interpretation |
|---|---|---|---|
| Observed data | x | p × 1 | What we actually measure (e.g., test scores) |
| Mean vector | μ | p × 1 | Population means of observed variables |
| Factor loadings | Λ | p × k | How factors influence each observed variable |
| Latent factors | z | k × 1 | Hidden causes (e.g., personality traits) |
| Unique factors | ε | p × 1 | Variable-specific noise/measurement error |
| Uniquenesses | Ψ | p × p (diag) | Variance of each unique factor |
The factor loading matrix Λ is the most interpretable and practically important component of the model. Each element λᵢⱼ quantifies the relationship between the j-th latent factor and the i-th observed variable.
Consider the i-th observed variable: $$x_i = \mu_i + \lambda_{i1}z_1 + \lambda_{i2}z_2 + \cdots + \lambda_{ik}z_k + \epsilon_i$$
The loading λᵢⱼ represents:
Geometrically, each row λᵢ = (λᵢ₁, λᵢ₂, ..., λᵢₖ) of the loading matrix represents the i-th observed variable as a point in k-dimensional factor space:
Loading magnitudes and their interpretation: • |λᵢⱼ| > 0.7: Strong influence; variable xᵢ is a "marker" of factor j • 0.4 < |λᵢⱼ| < 0.7: Moderate influence • 0.3 < |λᵢⱼ| < 0.4: Weak but potentially meaningful • |λᵢⱼ| < 0.3: Generally negligible
The sign indicates direction: positive loadings mean the variable increases with the factor; negative loadings mean inverse relationship.
Consider a 5-factor model of personality (the "Big Five") with observed variables being questionnaire items:
| Item Content | F1 (Extroversion) | F2 (Neuroticism) | F3 (Openness) |
|---|---|---|---|
| "I enjoy parties" | 0.82 | -0.12 | 0.08 |
| "I feel anxious often" | -0.05 | 0.79 | 0.03 |
| "I seek new experiences" | 0.31 | -0.08 | 0.74 |
| "I am the life of the party" | 0.85 | 0.04 | 0.15 |
| "I worry about the future" | -0.11 | 0.81 | -0.06 |
The loading pattern reveals which items "belong" to which factor, enabling interpretation of the factors as psychological constructs.
For each observed variable, we decompose its variance into two components:
$$\text{Var}(x_i) = \underbrace{\sum_{j=1}^{k} \lambda_{ij}^2}{h_i^2 \text{ (communality)}} + \underbrace{\psi_i}{\text{uniqueness}}$$
A cornerstone insight of factor analysis is that it constrains the covariance structure of observed data. Given the model assumptions, the covariance matrix of observations takes a specific form—and this constraint is what enables parameter estimation.
Starting from the factor model (assuming μ = 0 for simplicity): $$\mathbf{x} = \mathbf{\Lambda z} + \boldsymbol{\epsilon}$$
The covariance matrix of x is: $$\boldsymbol{\Sigma} = \text{Cov}(\mathbf{x}) = \text{Cov}(\mathbf{\Lambda z} + \boldsymbol{\epsilon})$$
Using properties of covariance and the independence of z and ε: $$\boldsymbol{\Sigma} = \mathbf{\Lambda} \text{Cov}(\mathbf{z}) \mathbf{\Lambda}^\top + \text{Cov}(\boldsymbol{\epsilon})$$
Substituting Cov(z) = I and Cov(ε) = Ψ:
$$\boldsymbol{\Sigma} = \mathbf{\Lambda \Lambda}^\top + \boldsymbol{\Psi}$$
This is the fundamental equation of factor analysis. It decomposes the covariance matrix into:
The factor model constrains the off-diagonal elements of Σ to arise from the common factors: σᵢⱼ = λᵢ₁λⱼ₁ + λᵢ₂λⱼ₂ + ... + λᵢₖλⱼₖ (for i ≠ j)
This means correlations between observed variables are entirely explained by their shared dependence on common factors. This is a testable hypothesis!
Understanding parameter counts is crucial for model identification:
Parameters to estimate:
Total parameters without constraints: pk + p
Constraints from the sample:
Identification requirement: For the model to be identified, we need: $$pk + p \leq \frac{p(p+1)}{2}$$
This gives the inequality: $$k \leq \frac{p - 1}{2}$$
However, rotational indeterminacy (discussed later) means we need additional constraints.
The similarity and difference with PCA become clear:
| Aspect | PCA | Factor Analysis |
|---|---|---|
| Model equation | x = ΛΛ' + E (exact) | Σ = ΛΛ' + Ψ (model) |
| Unique variance | All variance explained | Distinguishes unique vs common |
| Off-diagonals | Exactly reproduced | Explained by factors only |
| Estimation goal | Minimize reconstruction error | Match covariance structure |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
import numpy as npfrom scipy import linalg def demonstrate_factor_covariance_structure(): """ Demonstrate how factor loadings and uniquenesses compose to form the observed covariance matrix. """ # Define a simple 2-factor model for 5 variables # Factor loadings: each row is a variable, each column is a factor Lambda = np.array([ [0.8, 0.1], # Variable 1: loads strongly on Factor 1 [0.7, 0.2], # Variable 2: loads strongly on Factor 1 [0.1, 0.9], # Variable 3: loads strongly on Factor 2 [0.2, 0.8], # Variable 4: loads strongly on Factor 2 [0.5, 0.5], # Variable 5: cross-loader ]) # Uniquenesses (diagonal matrix) psi = np.array([0.35, 0.47, 0.18, 0.32, 0.50]) Psi = np.diag(psi) # The fundamental equation: Σ = ΛΛ' + Ψ Sigma = Lambda @ Lambda.T + Psi print("Factor Loadings (Λ):") print(Lambda) print("Communalities (h² = row sums of Λ²):") communalities = np.sum(Lambda**2, axis=1) print(communalities) print("Uniquesses (ψ):") print(psi) print("Verification: h² + ψ = diagonal of Σ:") print(f" h² + ψ = {communalities + psi}") print(f" diag(Σ) = {np.diag(Sigma)}") print("Implied Covariance Matrix (Σ):") print(np.round(Sigma, 3)) # Show that off-diagonals come only from common factors print("Off-diagonal element Σ[0,1] (Cov of Var1 and Var2):") print(f" From formula: λ₁₁λ₂₁ + λ₁₂λ₂₂ = {Lambda[0,0]*Lambda[1,0] + Lambda[0,1]*Lambda[1,1]:.3f}") print(f" From matrix: {Sigma[0,1]:.3f}") return Sigma, Lambda, Psi demonstrate_factor_covariance_structure()A fundamental and often surprising property of factor analysis is that the factor loadings are not unique. There exist infinitely many loading matrices that produce exactly the same covariance structure—and hence are statistically indistinguishable from data.
Let T be any k × k orthogonal matrix (T'T = TT' = I). Define a rotated loading matrix: $$\mathbf{\Lambda}^* = \mathbf{\Lambda T}$$
Then: $$\mathbf{\Lambda}^* (\mathbf{\Lambda}^*)^\top = \mathbf{\Lambda T T}^\top \mathbf{\Lambda}^\top = \mathbf{\Lambda \Lambda}^\top$$
The implied covariance Σ = ΛΛ' + Ψ is unchanged by rotations!
This indeterminacy has profound implications:
Rotational indeterminacy raises a deep question: if data cannot distinguish between different "meanings" of factors, are the factors truly real constructs or merely convenient mathematical summaries? This has fueled decades of debate in psychology, sociology, and other fields where factor analysis is applied.
Consider a 2-factor model. The loadings can be plotted in 2D space, where each point represents a variable. Any rotation of this plot around the origin produces an equally valid loading matrix:
Before rotation: Factor 1 might correlate with all variables moderately After rotation: Factor 1 might correlate strongly with half the variables and weakly with the rest
The data fit (likelihood, covariance reproduction) is identical—only interpretation changes.
To obtain a unique solution, we impose constraints. Two approaches dominate:
1. During estimation: Constrain Λ'Ψ⁻¹Λ to be diagonal (as in maximum likelihood FA). This gives a unique computational solution but may not be interpretable.
2. After estimation: Apply rotation criteria that maximize interpretability (Simple Structure, discussed in the rotation module).
The rotation matrix T has k² elements but only k(k-1)/2 free parameters (due to orthogonality constraints). This means k(k-1)/2 degrees of freedom in the loadings cannot be determined from data—they must be fixed by convention.
For k = 2 factors: 2(2-1)/2 = 1 rotation angle to fix For k = 3 factors: 3(3-1)/2 = 3 rotation angles to fix For k = 5 factors: 5(5-1)/2 = 10 rotation parameters to fix
The computational solution (e.g., from ML estimation) fixes these by convention; substantive rotation seeks a meaningful configuration.
Factor analysis rests on several assumptions, each with implications for when the technique is appropriate and how violations affect results.
The model assumes that observed variables are linear combinations of factors: $$x_i = \mu_i + \sum_{j=1}^{k} \lambda_{ij} z_j + \epsilon_i$$
Implications of violation: If the true relationship is nonlinear (e.g., quadratic, interaction effects), factor analysis will:
Diagnostics: Plot residuals against factor scores; look for curvilinearity.
Classical (maximum likelihood) factor analysis assumes both z and ε are Gaussian, implying x is multivariate normal.
Role in estimation: The Gaussian assumption enables the likelihood function: $$\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{\Lambda \Lambda}^\top + \boldsymbol{\Psi})$$
Effects of violation:
Remedies: Use robust standard errors, asymptotically distribution-free (ADF) estimation, or bootstrapping.
For factor analysis, check: • Univariate normality of each variable (skewness, kurtosis) • Mardia's multivariate normality test • Visual: Q-Q plots, scatterplot matrices
Skewness > 2 or kurtosis > 7 typically indicate serious non-normality requiring alternative methods.
The uniquenesses Ψ = diag(ψ₁, ..., ψₚ) form a diagonal matrix, meaning unique factors are mutually uncorrelated.
Interpretation: Any correlation between observed variables is entirely explained by common factors. If ε₁ and ε₂ are correlated, there's a "local dependence" not captured by the model.
Implications of violation:
When this assumption fails:
z ⫫ ε: Latent factors are independent of unique factors.
This is usually reasonable if we believe unique factors represent measurement error or variable-specific phenomena. It would be violated if, for example, high-ability individuals also had more variable responses.
The model assumes we specify the correct k. Too few factors leads to:
Too many factors leads to:
| Assumption | Tests/Diagnostics | Consequence of Violation | Remedies |
|---|---|---|---|
| Linearity | Residual plots, scatter plots | Biased loadings, inflated k | Transform variables, use nonlinear FA |
| Normality | Mardia's test, Q-Q plots | Invalid SEs and χ² tests | Robust SEs, ADF, bootstrap |
| Diagonal Ψ | Residual correlations | Biased loadings, wrong k | Allow correlated uniquenesses |
| z ⫫ ε | Theory-based | Theoretical concerns | Careful model specification |
| Correct k | Fit indices, parallel analysis | Mis-specified model | Model comparison, theory |
Factor analysis embodies a powerful conceptual framework: the distinction between common variance (shared across variables and attributable to common factors) and unique variance (specific to each variable).
For each observed variable xᵢ: $$\text{Var}(x_i) = h_i^2 + \psi_i$$
where:
This decomposition has profound implications:
1. Measurement Quality Variables with high communality (h² → 1) are well-measured by the factors; they share substantial variance with other variables. Variables with low communality have mostly unique variance—they may be measuring something not captured by the common factors, or they may simply be noisy.
2. Factor Reliability Factors defined by variables with high communalities will be more reliable and generalizable than factors anchored by low-communality variables.
3. The "Unique Variance" Interpretation Unique variance combines two sources:
Without additional information (e.g., test-retest reliability), these cannot be separated.
In the early days of factor analysis, communalities had to be estimated iteratively or guessed (often using the highest correlation of each variable as a starting estimate). Modern maximum likelihood estimation jointly estimates loadings and uniquenesses, resolving this "communality problem."
The philosophical commitment of factor analysis is that correlations between observed variables reflect shared underlying causes. When variables correlate, it's because they're all influenced by the same latent factors.
This is a realist interpretation: factors are not merely convenient summaries but represent genuine underlying constructs (abilities, traits, attitudes) that cause variation in observed responses.
The alternative instrumentalist view treats factors as useful fictions—computational devices for data reduction without claims about underlying reality.
Both views are legitimate, and the choice affects how we interpret and use factor solutions:
Factor analysis is a powerful technique, but it's not universally applicable. Understanding when it's appropriate—and when alternatives like PCA are preferable—is crucial for effective practice.
1. You have a theoretical model of latent causes Factor analysis assumes observed variables are effects of latent factors. If you believe test items reflect underlying traits, survey responses reflect underlying attitudes, or symptoms reflect underlying syndromes, FA is conceptually appropriate.
2. You want to distinguish common from unique variance If measurement error is a concern and you want to model it explicitly (rather than treating all variance as signal), FA's uniqueness parameters are valuable.
3. You're developing or validating measurement instruments FA is central to psychometrics, helping determine whether items measure hypothesized constructs and whether the measurement model fits.
4. You want to test a specific factor structure With Confirmatory Factor Analysis (CFA), you can specify and test hypothesized loading patterns—something PCA cannot do.
5. Correlations are your primary interest FA models the correlation/covariance structure directly; if you want to understand what drives correlations, FA provides that framework.
Despite different conceptual foundations, PCA and FA often produce similar results with large samples and high communalities. The debate over which to use has persisted for decades. Key point: if communalities are very high (> 0.8), PCA and FA loadings converge. If communalities are low, the distinction matters more.
Factor analysis requires adequate sample size for:
Rules of thumb (increasingly refined over time):
| Guideline | Recommendation | Notes |
|---|---|---|
| Absolute minimum | n ≥ 100 | For exploratory FA |
| Subject-to-variable ratio | n/p ≥ 5-10 | Classical rule |
| Communality-adjusted | Higher communalities → smaller n needed | Modern simulation-based |
| SEM/CFA guidelines | n ≥ 200 for typical models | More complex models need more |
Modern consensus: The "right" sample size depends on communalities, number of factors, number of indicators per factor, and degree of model misspecification. Well-defined factors with 4+ high-loading items can sometimes be recovered with n = 100; poorly defined factors may need n = 500+.
We have established the foundational framework of factor analysis. Let's consolidate the key concepts:
With the generative model established, the next page examines Factor Analysis vs PCA—a detailed comparison of these two related but distinct approaches to dimensionality reduction. We'll explore their different assumptions, estimation methods, and when each is most appropriate.
You now understand the mathematical and philosophical foundations of factor analysis. The generative model perspective—that observed data arises from latent causes—provides a principled framework for measurement and inference that extends far beyond simple data reduction.