Factor Analysis - Learning Module

Loading content...

0/278

Latent Factor Model

The Quest for Hidden Causes

Imagine you're a psychologist analyzing the results of a comprehensive personality test with 50 different questions. Each person's responses form a 50-dimensional data point. Yet intuitively, you suspect that these 50 observable responses don't represent 50 independent aspects of personality—instead, they're likely manifestations of a smaller number of underlying personality traits like extroversion, neuroticism, and conscientiousness.

This fundamental insight—that high-dimensional observable data is often generated by a smaller number of hidden (latent) factors—lies at the heart of factor analysis (FA). Factor analysis is not merely a dimensionality reduction technique; it's a generative probabilistic model that explicitly models how latent factors combine to produce observed data.

The distinction between factor analysis and techniques like Principal Component Analysis (PCA) is profound. While PCA finds directions of maximum variance without any generative model, factor analysis posits that observed variables are linear combinations of latent factors plus unique noise—a causal hypothesis about data generation that enables deeper interpretation and principled statistical inference.

What You Will Learn

By the end of this page, you will understand: • The generative model of factor analysis and its mathematical specification • The meaning and interpretation of factor loadings and uniquenesses • The fundamental assumptions of factor analysis (linearity, Gaussianity, independence) • The identifiability problem and rotational indeterminacy • When factor analysis is appropriate versus when PCA suffices • The philosophical distinction between "common" and "unique" variance

The Generative Model Framework

Factor analysis begins with a bold hypothesis: the observable data we measure is not fundamental, but rather generated by a smaller set of unobservable latent factors. This generative perspective distinguishes factor analysis from purely descriptive techniques and places it within the broader family of latent variable models.

The Core Generative Equation

Let x be a p-dimensional observed random vector (e.g., responses to p questionnaire items). Factor analysis posits that x is generated according to:

$$\mathbf{x} = \boldsymbol{\mu} + \mathbf{\Lambda} \mathbf{z} + \boldsymbol{\epsilon}$$

where:

x ∈ ℝᵖ is the observed data vector
μ ∈ ℝᵖ is the mean vector of the observations
Λ ∈ ℝᵖˣᵏ is the factor loading matrix (with k < p factors)
z ∈ ℝᵏ is the vector of latent factors (unobserved)
ε ∈ ℝᵖ is the unique factor or noise vector

This equation encodes the fundamental assumption: each observed variable is a linear combination of common factors (Λz) plus variable-specific noise (ε).

Why 'Generative'?

The generative perspective imagines data creation as a process: first, nature samples latent factor values z, then these factors "cause" observed values x through the linear transformation Λz, with each observed variable additionally receiving unique noise ε. This is a causal model, not just a statistical summary.

Distributional Assumptions

For classical factor analysis, we impose Gaussian assumptions:

Latent factors: $$\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_k)$$

The factors are assumed to have:

Zero mean (absorbed into μ)
Unit variance (for identifiability—we'll explain this later)
Orthogonal to each other (identity covariance matrix)

Unique factors (noise): $$\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Psi})$$

where Ψ = diag(ψ₁, ψ₂, ..., ψₚ) is a diagonal covariance matrix. The diagonal structure encodes the crucial assumption that unique factors are mutually uncorrelated.

Independence assumption: $$\mathbf{z} \perp \boldsymbol{\epsilon}$$

The latent factors and unique factors are assumed independent.

Components of the Factor Analysis Model
Component	Notation	Dimension	Interpretation
Observed data	x	p × 1	What we actually measure (e.g., test scores)
Mean vector	μ	p × 1	Population means of observed variables
Factor loadings	Λ	p × k	How factors influence each observed variable
Latent factors	z	k × 1	Hidden causes (e.g., personality traits)
Unique factors	ε	p × 1	Variable-specific noise/measurement error
Uniquenesses	Ψ	p × p (diag)	Variance of each unique factor

Factor Loadings: The Heart of Interpretation

The factor loading matrix Λ is the most interpretable and practically important component of the model. Each element λᵢⱼ quantifies the relationship between the j-th latent factor and the i-th observed variable.

Mathematical Meaning of Loadings

Consider the i-th observed variable: $$x_i = \mu_i + \lambda_{i1}z_1 + \lambda_{i2}z_2 + \cdots + \lambda_{ik}z_k + \epsilon_i$$

The loading λᵢⱼ represents:

Regression coefficient: How much the expected value of xᵢ changes when zⱼ increases by one unit (holding other factors constant)
Correlation (under standard assumptions): When factors are orthogonal and have unit variance, λᵢⱼ = Cov(xᵢ, zⱼ) = Cor(xᵢ, zⱼ) · σ(xᵢ)

Geometric Interpretation

Geometrically, each row λᵢ = (λᵢ₁, λᵢ₂, ..., λᵢₖ) of the loading matrix represents the i-th observed variable as a point in k-dimensional factor space:

Length: ||λᵢ||² = h²ᵢ, the communality (variance explained by common factors)
Direction: Points toward the factor(s) that most influence variable i
Angle between rows: Related to correlation between observed variables

Loading Interpretation Guidelines

Loading magnitudes and their interpretation: • |λᵢⱼ| > 0.7: Strong influence; variable xᵢ is a "marker" of factor j • 0.4 < |λᵢⱼ| < 0.7: Moderate influence • 0.3 < |λᵢⱼ| < 0.4: Weak but potentially meaningful • |λᵢⱼ| < 0.3: Generally negligible

The sign indicates direction: positive loadings mean the variable increases with the factor; negative loadings mean inverse relationship.

Example: Personality Assessment

Consider a 5-factor model of personality (the "Big Five") with observed variables being questionnaire items:

Item Content	F1 (Extroversion)	F2 (Neuroticism)	F3 (Openness)
"I enjoy parties"	0.82	-0.12	0.08
"I feel anxious often"	-0.05	0.79	0.03
"I seek new experiences"	0.31	-0.08	0.74
"I am the life of the party"	0.85	0.04	0.15
"I worry about the future"	-0.11	0.81	-0.06

The loading pattern reveals which items "belong" to which factor, enabling interpretation of the factors as psychological constructs.

Communality and Uniqueness

For each observed variable, we decompose its variance into two components:

$$\text{Var}(x_i) = \underbrace{\sum_{j=1}^{k} \lambda_{ij}^2}{h_i^2 \text{ (communality)}} + \underbrace{\psi_i}{\text{uniqueness}}$$

Communality (h²ᵢ): Proportion of variance in xᵢ explained by common factors. High communality means the variable is well-represented by the factor model.
Uniqueness (ψᵢ): Variance not explained by common factors—either measurement error or variance specific to that variable alone.

Common Pitfalls in Loading Interpretation

•Cross-loadings: A variable loading heavily on multiple factors complicates interpretation. This may indicate the variable measures multiple constructs or that the factor structure is misspecified.
•Heywood cases: When estimated communality exceeds 1 (or uniqueness becomes negative), this indicates model misspecification, too few observations, or too many factors.
•Ignoring standard errors: Loadings are estimates with uncertainty. A loading of 0.35 might not be significantly different from 0 with small samples.
•Over-interpreting small loadings: In large samples, tiny loadings can be statistically significant but practically meaningless.

The Implied Covariance Structure

A cornerstone insight of factor analysis is that it constrains the covariance structure of observed data. Given the model assumptions, the covariance matrix of observations takes a specific form—and this constraint is what enables parameter estimation.

Deriving the Covariance Structure

Starting from the factor model (assuming μ = 0 for simplicity): $$\mathbf{x} = \mathbf{\Lambda z} + \boldsymbol{\epsilon}$$

The covariance matrix of x is: $$\boldsymbol{\Sigma} = \text{Cov}(\mathbf{x}) = \text{Cov}(\mathbf{\Lambda z} + \boldsymbol{\epsilon})$$

Using properties of covariance and the independence of z and ε: $$\boldsymbol{\Sigma} = \mathbf{\Lambda} \text{Cov}(\mathbf{z}) \mathbf{\Lambda}^\top + \text{Cov}(\boldsymbol{\epsilon})$$

Substituting Cov(z) = I and Cov(ε) = Ψ:

$$\boldsymbol{\Sigma} = \mathbf{\Lambda \Lambda}^\top + \boldsymbol{\Psi}$$

This is the fundamental equation of factor analysis. It decomposes the covariance matrix into:

ΛΛ': Covariance attributable to common factors (low-rank if k << p)
Ψ: Diagonal matrix of uniquenesses (variable-specific variance)

Structural Constraint

The factor model constrains the off-diagonal elements of Σ to arise from the common factors: σᵢⱼ = λᵢ₁λⱼ₁ + λᵢ₂λⱼ₂ + ... + λᵢₖλⱼₖ (for i ≠ j)

This means correlations between observed variables are entirely explained by their shared dependence on common factors. This is a testable hypothesis!

Parameter Counting and Identification

Understanding parameter counts is crucial for model identification:

Parameters to estimate:

Λ: p × k loadings
Ψ: p uniquenesses

Total parameters without constraints: pk + p

Constraints from the sample:

Sample covariance matrix has p(p+1)/2 unique elements

Identification requirement: For the model to be identified, we need: $$pk + p \leq \frac{p(p+1)}{2}$$

This gives the inequality: $$k \leq \frac{p - 1}{2}$$

However, rotational indeterminacy (discussed later) means we need additional constraints.

Comparison with PCA

The similarity and difference with PCA become clear:

Aspect	PCA	Factor Analysis
Model equation	x = ΛΛ' + E (exact)	Σ = ΛΛ' + Ψ (model)
Unique variance	All variance explained	Distinguishes unique vs common
Off-diagonals	Exactly reproduced	Explained by factors only
Estimation goal	Minimize reconstruction error	Match covariance structure

covariance_decomposition.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import numpy as np
from scipy import linalg
 
def demonstrate_factor_covariance_structure():
    """
    Demonstrate how factor loadings and uniquenesses
    compose to form the observed covariance matrix.
    """
    # Define a simple 2-factor model for 5 variables
    # Factor loadings: each row is a variable, each column is a factor
    Lambda = np.array([
        [0.8, 0.1],  # Variable 1: loads strongly on Factor 1
        [0.7, 0.2],  # Variable 2: loads strongly on Factor 1
        [0.1, 0.9],  # Variable 3: loads strongly on Factor 2
        [0.2, 0.8],  # Variable 4: loads strongly on Factor 2
        [0.5, 0.5],  # Variable 5: cross-loader
    ])
    
    # Uniquenesses (diagonal matrix)
    psi = np.array([0.35, 0.47, 0.18, 0.32, 0.50])
    Psi = np.diag(psi)
    
    # The fundamental equation: Σ = ΛΛ' + Ψ
    Sigma = Lambda @ Lambda.T + Psi
    
    print("Factor Loadings (Λ):")
    print(Lambda)
    print("
Communalities (h² = row sums of Λ²):")
    communalities = np.sum(Lambda**2, axis=1)
    print(communalities)
    print("
Uniquesses (ψ):")
    print(psi)
    print("
Verification: h² + ψ = diagonal of Σ:")
    print(f"  h² + ψ = {communalities + psi}")
    print(f"  diag(Σ) = {np.diag(Sigma)}")
    
    print("
Implied Covariance Matrix (Σ):")
    print(np.round(Sigma, 3))
    
    # Show that off-diagonals come only from common factors
    print("
Off-diagonal element Σ[0,1] (Cov of Var1 and Var2):")
    print(f"  From formula: λ₁₁λ₂₁ + λ₁₂λ₂₂ = {Lambda[0,0]*Lambda[1,0] + Lambda[0,1]*Lambda[1,1]:.3f}")
    print(f"  From matrix:  {Sigma[0,1]:.3f}")
    
    return Sigma, Lambda, Psi
 
demonstrate_factor_covariance_structure()

Rotational Indeterminacy: The Identification Problem

A fundamental and often surprising property of factor analysis is that the factor loadings are not unique. There exist infinitely many loading matrices that produce exactly the same covariance structure—and hence are statistically indistinguishable from data.

The Rotation Invariance Property

Let T be any k × k orthogonal matrix (T'T = TT' = I). Define a rotated loading matrix: $$\mathbf{\Lambda}^* = \mathbf{\Lambda T}$$

Then: $$\mathbf{\Lambda}^* (\mathbf{\Lambda}^*)^\top = \mathbf{\Lambda T T}^\top \mathbf{\Lambda}^\top = \mathbf{\Lambda \Lambda}^\top$$

The implied covariance Σ = ΛΛ' + Ψ is unchanged by rotations!

Why This Matters

This indeterminacy has profound implications:

No unique "true" solution: Infinitely many statistically equivalent factor structures exist
Interpretation depends on rotation: The same model can appear to measure different constructs depending on rotation choice
Rotation criteria become central: We need principled ways to choose among equivalent solutions

Philosophical Implication

Rotational indeterminacy raises a deep question: if data cannot distinguish between different "meanings" of factors, are the factors truly real constructs or merely convenient mathematical summaries? This has fueled decades of debate in psychology, sociology, and other fields where factor analysis is applied.

Visualizing Rotation

Consider a 2-factor model. The loadings can be plotted in 2D space, where each point represents a variable. Any rotation of this plot around the origin produces an equally valid loading matrix:

Before rotation: Factor 1 might correlate with all variables moderately After rotation: Factor 1 might correlate strongly with half the variables and weakly with the rest

The data fit (likelihood, covariance reproduction) is identical—only interpretation changes.

Constraints to Resolve Indeterminacy

To obtain a unique solution, we impose constraints. Two approaches dominate:

1. During estimation: Constrain Λ'Ψ⁻¹Λ to be diagonal (as in maximum likelihood FA). This gives a unique computational solution but may not be interpretable.

2. After estimation: Apply rotation criteria that maximize interpretability (Simple Structure, discussed in the rotation module).

Mathematical Details: Degrees of Freedom Lost to Rotation

The rotation matrix T has k² elements but only k(k-1)/2 free parameters (due to orthogonality constraints). This means k(k-1)/2 degrees of freedom in the loadings cannot be determined from data—they must be fixed by convention.

For k = 2 factors: 2(2-1)/2 = 1 rotation angle to fix For k = 3 factors: 3(3-1)/2 = 3 rotation angles to fix For k = 5 factors: 5(5-1)/2 = 10 rotation parameters to fix

The computational solution (e.g., from ML estimation) fixes these by convention; substantive rotation seeks a meaningful configuration.

Implications for Practice

•Always report rotation method: Results depend critically on the rotation used. Unrotated solutions are rarely interpretable.
•Compare multiple rotations: Different rotations can reveal different aspects of the data structure.
•Substantive justification: The choice of rotation should be guided by theory, not just statistical criteria.
•Replication across samples: If a factor structure is 'real', it should replicate with similar rotations across independent samples.

Key Assumptions Examined

Factor analysis rests on several assumptions, each with implications for when the technique is appropriate and how violations affect results.

Assumption 1: Linearity

The model assumes that observed variables are linear combinations of factors: $$x_i = \mu_i + \sum_{j=1}^{k} \lambda_{ij} z_j + \epsilon_i$$

Implications of violation: If the true relationship is nonlinear (e.g., quadratic, interaction effects), factor analysis will:

Produce suboptimal fits
May require more factors than the "true" number
Loadings won't correctly represent the underlying structure

Diagnostics: Plot residuals against factor scores; look for curvilinearity.

Assumption 2: Multivariate Normality

Classical (maximum likelihood) factor analysis assumes both z and ε are Gaussian, implying x is multivariate normal.

Role in estimation: The Gaussian assumption enables the likelihood function: $$\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{\Lambda \Lambda}^\top + \boldsymbol{\Psi})$$

Effects of violation:

Standard errors may be biased (typically underestimated)
Chi-square goodness-of-fit tests become unreliable
ML estimates remain consistent but lose efficiency

Remedies: Use robust standard errors, asymptotically distribution-free (ADF) estimation, or bootstrapping.

Testing Normality

For factor analysis, check: • Univariate normality of each variable (skewness, kurtosis) • Mardia's multivariate normality test • Visual: Q-Q plots, scatterplot matrices

Skewness > 2 or kurtosis > 7 typically indicate serious non-normality requiring alternative methods.

Assumption 3: Uncorrelated Unique Factors

The uniquenesses Ψ = diag(ψ₁, ..., ψₚ) form a diagonal matrix, meaning unique factors are mutually uncorrelated.

Interpretation: Any correlation between observed variables is entirely explained by common factors. If ε₁ and ε₂ are correlated, there's a "local dependence" not captured by the model.

Implications of violation:

Off-diagonal covariances are misattributed to common factors
May require additional factors to absorb the local dependence
Factor loadings become biased

When this assumption fails:

Longitudinal data with method effects (same question at different times)
Multi-trait multi-method designs
Items with shared wording or response format

Assumption 4: Independence of Factors and Unique Factors

z ⫫ ε: Latent factors are independent of unique factors.

This is usually reasonable if we believe unique factors represent measurement error or variable-specific phenomena. It would be violated if, for example, high-ability individuals also had more variable responses.

Assumption 5: Correct Number of Factors

The model assumes we specify the correct k. Too few factors leads to:

Poor fit
Distorted remaining loadings

Too many factors leads to:

Overfitting
Difficult interpretation
Factors that don't replicate

Summary of Assumptions and Their Importance
Assumption	Tests/Diagnostics	Consequence of Violation	Remedies
Linearity	Residual plots, scatter plots	Biased loadings, inflated k	Transform variables, use nonlinear FA
Normality	Mardia's test, Q-Q plots	Invalid SEs and χ² tests	Robust SEs, ADF, bootstrap
Diagonal Ψ	Residual correlations	Biased loadings, wrong k	Allow correlated uniquenesses
z ⫫ ε	Theory-based	Theoretical concerns	Careful model specification
Correct k	Fit indices, parallel analysis	Mis-specified model	Model comparison, theory

The Philosophy of Common and Unique Variance

Factor analysis embodies a powerful conceptual framework: the distinction between common variance (shared across variables and attributable to common factors) and unique variance (specific to each variable).

The Variance Decomposition

For each observed variable xᵢ: $$\text{Var}(x_i) = h_i^2 + \psi_i$$

where:

h²ᵢ = Σⱼ λ²ᵢⱼ: Communality—variance attributable to common factors
ψᵢ: Uniqueness—variance not shared with other variables

This decomposition has profound implications:

1. Measurement Quality Variables with high communality (h² → 1) are well-measured by the factors; they share substantial variance with other variables. Variables with low communality have mostly unique variance—they may be measuring something not captured by the common factors, or they may simply be noisy.

2. Factor Reliability Factors defined by variables with high communalities will be more reliable and generalizable than factors anchored by low-communality variables.

3. The "Unique Variance" Interpretation Unique variance combines two sources:

Specific variance: True variance in the construct that isn't shared with other measured variables
Error variance: Measurement unreliability

Without additional information (e.g., test-retest reliability), these cannot be separated.

The Communality Problem

In the early days of factor analysis, communalities had to be estimated iteratively or guessed (often using the highest correlation of each variable as a starting estimate). Modern maximum likelihood estimation jointly estimates loadings and uniquenesses, resolving this "communality problem."

High Communality (e.g., 0.7+)

•Variable is well-explained by common factors
•Strong marker for the factor(s)
•Reliable for scale construction
•Good for factor score estimation

Low Communality (e.g., < 0.3)

•Variable mostly measures something unique
•May indicate poor item quality
•Could be measuring a different construct
•Consider removing from analysis

The Common Factor Interpretation

The philosophical commitment of factor analysis is that correlations between observed variables reflect shared underlying causes. When variables correlate, it's because they're all influenced by the same latent factors.

This is a realist interpretation: factors are not merely convenient summaries but represent genuine underlying constructs (abilities, traits, attitudes) that cause variation in observed responses.

The alternative instrumentalist view treats factors as useful fictions—computational devices for data reduction without claims about underlying reality.

Both views are legitimate, and the choice affects how we interpret and use factor solutions:

Realist: Factors are named and interpreted as real constructs; validated externally
Instrumentalist: Factors are tools; interpretation is secondary to prediction/compression

When to Use Factor Analysis

Factor analysis is a powerful technique, but it's not universally applicable. Understanding when it's appropriate—and when alternatives like PCA are preferable—is crucial for effective practice.

Factor Analysis is Appropriate When:

1. You have a theoretical model of latent causes Factor analysis assumes observed variables are effects of latent factors. If you believe test items reflect underlying traits, survey responses reflect underlying attitudes, or symptoms reflect underlying syndromes, FA is conceptually appropriate.

2. You want to distinguish common from unique variance If measurement error is a concern and you want to model it explicitly (rather than treating all variance as signal), FA's uniqueness parameters are valuable.

3. You're developing or validating measurement instruments FA is central to psychometrics, helping determine whether items measure hypothesized constructs and whether the measurement model fits.

4. You want to test a specific factor structure With Confirmatory Factor Analysis (CFA), you can specify and test hypothesized loading patterns—something PCA cannot do.

5. Correlations are your primary interest FA models the correlation/covariance structure directly; if you want to understand what drives correlations, FA provides that framework.

Choose Factor Analysis When:

•Testing a measurement model
•Developing psychological scales
•Separating signal from noise
•Interpreting latent constructs
•Correlations are effects of causes

Choose PCA When:

•Simple dimensionality reduction
•No measurement error model needed
•Components, not latent causes
•Preprocessing for other methods
•Maximum variance extraction

The PCA vs FA Debate

Despite different conceptual foundations, PCA and FA often produce similar results with large samples and high communalities. The debate over which to use has persisted for decades. Key point: if communalities are very high (> 0.8), PCA and FA loadings converge. If communalities are low, the distinction matters more.

Sample Size Considerations

Factor analysis requires adequate sample size for:

Stable correlation/covariance estimates
Converged and proper solutions
Trustworthy standard errors and fit indices

Rules of thumb (increasingly refined over time):

Guideline	Recommendation	Notes
Absolute minimum	n ≥ 100	For exploratory FA
Subject-to-variable ratio	n/p ≥ 5-10	Classical rule
Communality-adjusted	Higher communalities → smaller n needed	Modern simulation-based
SEM/CFA guidelines	n ≥ 200 for typical models	More complex models need more

Modern consensus: The "right" sample size depends on communalities, number of factors, number of indicators per factor, and degree of model misspecification. Well-defined factors with 4+ high-loading items can sometimes be recovered with n = 100; poorly defined factors may need n = 500+.

Summary: The Latent Factor Model

We have established the foundational framework of factor analysis. Let's consolidate the key concepts:

Key Takeaways

•Factor analysis is generative: It posits that observed data x = Λz + ε is generated by latent factors z plus unique noise ε. This is fundamentally different from PCA's descriptive approach.
•Factor loadings (Λ) are central: They quantify how each latent factor influences each observed variable. High loadings indicate strong influence; loading patterns reveal factor meaning.
•Uniquenesses (Ψ) model noise: The diagonal matrix Ψ captures variable-specific variance not explained by common factors—measurement error plus specific variance.
•The fundamental equation Σ = ΛΛ' + Ψ: This constrains the covariance structure; off-diagonal covariances arise entirely from common factors.
•Rotational indeterminacy is inherent: Infinitely many equivalent loading matrices exist. Rotation choices affect interpretation but not model fit.
•Strong assumptions underpin the model: Linearity, normality, uncorrelated uniquenesses, and factor-uniqueness independence. Violations have consequences.
•Common vs unique variance: FA explicitly separates shared (common) variance from variable-specific (unique) variance—a philosophically rich distinction.

What's Next

With the generative model established, the next page examines Factor Analysis vs PCA—a detailed comparison of these two related but distinct approaches to dimensionality reduction. We'll explore their different assumptions, estimation methods, and when each is most appropriate.

Conceptual Foundation Complete

You now understand the mathematical and philosophical foundations of factor analysis. The generative model perspective—that observed data arises from latent causes—provides a principled framework for measurement and inference that extends far beyond simple data reduction.

Latent Factor Model

The Quest for Hidden Causes

What You Will Learn

The Generative Model Framework

The Core Generative Equation

Let x be a p-dimensional observed random vector (e.g., responses to p questionnaire items). Factor analysis posits that x is generated according to:

$$\mathbf{x} = \boldsymbol{\mu} + \mathbf{\Lambda} \mathbf{z} + \boldsymbol{\epsilon}$$

where:

x ∈ ℝᵖ is the observed data vector
μ ∈ ℝᵖ is the mean vector of the observations
Λ ∈ ℝᵖˣᵏ is the factor loading matrix (with k < p factors)
z ∈ ℝᵏ is the vector of latent factors (unobserved)
ε ∈ ℝᵖ is the unique factor or noise vector

This equation encodes the fundamental assumption: each observed variable is a linear combination of common factors (Λz) plus variable-specific noise (ε).

Why 'Generative'?

Distributional Assumptions

For classical factor analysis, we impose Gaussian assumptions:

Latent factors: $$\mathbf{z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_k)$$

The factors are assumed to have:

Zero mean (absorbed into μ)
Unit variance (for identifiability—we'll explain this later)
Orthogonal to each other (identity covariance matrix)

Unique factors (noise): $$\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Psi})$$

where Ψ = diag(ψ₁, ψ₂, ..., ψₚ) is a diagonal covariance matrix. The diagonal structure encodes the crucial assumption that unique factors are mutually uncorrelated.

Independence assumption: $$\mathbf{z} \perp \boldsymbol{\epsilon}$$

The latent factors and unique factors are assumed independent.

Components of the Factor Analysis Model
Component	Notation	Dimension	Interpretation
Observed data	x	p × 1	What we actually measure (e.g., test scores)
Mean vector	μ	p × 1	Population means of observed variables
Factor loadings	Λ	p × k	How factors influence each observed variable
Latent factors	z	k × 1	Hidden causes (e.g., personality traits)
Unique factors	ε	p × 1	Variable-specific noise/measurement error
Uniquenesses	Ψ	p × p (diag)	Variance of each unique factor

Factor Loadings: The Heart of Interpretation

Mathematical Meaning of Loadings

Consider the i-th observed variable: $$x_i = \mu_i + \lambda_{i1}z_1 + \lambda_{i2}z_2 + \cdots + \lambda_{ik}z_k + \epsilon_i$$

The loading λᵢⱼ represents:

Regression coefficient: How much the expected value of xᵢ changes when zⱼ increases by one unit (holding other factors constant)
Correlation (under standard assumptions): When factors are orthogonal and have unit variance, λᵢⱼ = Cov(xᵢ, zⱼ) = Cor(xᵢ, zⱼ) · σ(xᵢ)

Geometric Interpretation

Geometrically, each row λᵢ = (λᵢ₁, λᵢ₂, ..., λᵢₖ) of the loading matrix represents the i-th observed variable as a point in k-dimensional factor space:

Length: ||λᵢ||² = h²ᵢ, the communality (variance explained by common factors)
Direction: Points toward the factor(s) that most influence variable i
Angle between rows: Related to correlation between observed variables

Loading Interpretation Guidelines

The sign indicates direction: positive loadings mean the variable increases with the factor; negative loadings mean inverse relationship.

Example: Personality Assessment

Consider a 5-factor model of personality (the "Big Five") with observed variables being questionnaire items:

Item Content	F1 (Extroversion)	F2 (Neuroticism)	F3 (Openness)
"I enjoy parties"	0.82	-0.12	0.08
"I feel anxious often"	-0.05	0.79	0.03
"I seek new experiences"	0.31	-0.08	0.74
"I am the life of the party"	0.85	0.04	0.15
"I worry about the future"	-0.11	0.81	-0.06

The loading pattern reveals which items "belong" to which factor, enabling interpretation of the factors as psychological constructs.

Communality and Uniqueness

For each observed variable, we decompose its variance into two components:

$$\text{Var}(x_i) = \underbrace{\sum_{j=1}^{k} \lambda_{ij}^2}{h_i^2 \text{ (communality)}} + \underbrace{\psi_i}{\text{uniqueness}}$$

Communality (h²ᵢ): Proportion of variance in xᵢ explained by common factors. High communality means the variable is well-represented by the factor model.
Uniqueness (ψᵢ): Variance not explained by common factors—either measurement error or variance specific to that variable alone.

Common Pitfalls in Loading Interpretation

•Cross-loadings: A variable loading heavily on multiple factors complicates interpretation. This may indicate the variable measures multiple constructs or that the factor structure is misspecified.
•Heywood cases: When estimated communality exceeds 1 (or uniqueness becomes negative), this indicates model misspecification, too few observations, or too many factors.
•Ignoring standard errors: Loadings are estimates with uncertainty. A loading of 0.35 might not be significantly different from 0 with small samples.
•Over-interpreting small loadings: In large samples, tiny loadings can be statistically significant but practically meaningless.

The Implied Covariance Structure

Deriving the Covariance Structure

Starting from the factor model (assuming μ = 0 for simplicity): $$\mathbf{x} = \mathbf{\Lambda z} + \boldsymbol{\epsilon}$$

The covariance matrix of x is: $$\boldsymbol{\Sigma} = \text{Cov}(\mathbf{x}) = \text{Cov}(\mathbf{\Lambda z} + \boldsymbol{\epsilon})$$

Using properties of covariance and the independence of z and ε: $$\boldsymbol{\Sigma} = \mathbf{\Lambda} \text{Cov}(\mathbf{z}) \mathbf{\Lambda}^\top + \text{Cov}(\boldsymbol{\epsilon})$$

Substituting Cov(z) = I and Cov(ε) = Ψ:

$$\boldsymbol{\Sigma} = \mathbf{\Lambda \Lambda}^\top + \boldsymbol{\Psi}$$

This is the fundamental equation of factor analysis. It decomposes the covariance matrix into:

ΛΛ': Covariance attributable to common factors (low-rank if k << p)
Ψ: Diagonal matrix of uniquenesses (variable-specific variance)

Structural Constraint

The factor model constrains the off-diagonal elements of Σ to arise from the common factors: σᵢⱼ = λᵢ₁λⱼ₁ + λᵢ₂λⱼ₂ + ... + λᵢₖλⱼₖ (for i ≠ j)

This means correlations between observed variables are entirely explained by their shared dependence on common factors. This is a testable hypothesis!

Parameter Counting and Identification

Understanding parameter counts is crucial for model identification:

Parameters to estimate:

Λ: p × k loadings
Ψ: p uniquenesses

Total parameters without constraints: pk + p

Constraints from the sample:

Sample covariance matrix has p(p+1)/2 unique elements

Identification requirement: For the model to be identified, we need: $$pk + p \leq \frac{p(p+1)}{2}$$

This gives the inequality: $$k \leq \frac{p - 1}{2}$$

However, rotational indeterminacy (discussed later) means we need additional constraints.

Comparison with PCA

The similarity and difference with PCA become clear:

Aspect	PCA	Factor Analysis
Model equation	x = ΛΛ' + E (exact)	Σ = ΛΛ' + Ψ (model)
Unique variance	All variance explained	Distinguishes unique vs common
Off-diagonals	Exactly reproduced	Explained by factors only
Estimation goal	Minimize reconstruction error	Match covariance structure

covariance_decomposition.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import numpy as np
from scipy import linalg
 
def demonstrate_factor_covariance_structure():
    """
    Demonstrate how factor loadings and uniquenesses
    compose to form the observed covariance matrix.
    """
    # Define a simple 2-factor model for 5 variables
    # Factor loadings: each row is a variable, each column is a factor
    Lambda = np.array([
        [0.8, 0.1],  # Variable 1: loads strongly on Factor 1
        [0.7, 0.2],  # Variable 2: loads strongly on Factor 1
        [0.1, 0.9],  # Variable 3: loads strongly on Factor 2
        [0.2, 0.8],  # Variable 4: loads strongly on Factor 2
        [0.5, 0.5],  # Variable 5: cross-loader
    ])
    
    # Uniquenesses (diagonal matrix)
    psi = np.array([0.35, 0.47, 0.18, 0.32, 0.50])
    Psi = np.diag(psi)
    
    # The fundamental equation: Σ = ΛΛ' + Ψ
    Sigma = Lambda @ Lambda.T + Psi
    
    print("Factor Loadings (Λ):")
    print(Lambda)
    print("
Communalities (h² = row sums of Λ²):")
    communalities = np.sum(Lambda**2, axis=1)
    print(communalities)
    print("
Uniquesses (ψ):")
    print(psi)
    print("
Verification: h² + ψ = diagonal of Σ:")
    print(f"  h² + ψ = {communalities + psi}")
    print(f"  diag(Σ) = {np.diag(Sigma)}")
    
    print("
Implied Covariance Matrix (Σ):")
    print(np.round(Sigma, 3))
    
    # Show that off-diagonals come only from common factors
    print("
Off-diagonal element Σ[0,1] (Cov of Var1 and Var2):")
    print(f"  From formula: λ₁₁λ₂₁ + λ₁₂λ₂₂ = {Lambda[0,0]*Lambda[1,0] + Lambda[0,1]*Lambda[1,1]:.3f}")
    print(f"  From matrix:  {Sigma[0,1]:.3f}")
    
    return Sigma, Lambda, Psi
 
demonstrate_factor_covariance_structure()

Rotational Indeterminacy: The Identification Problem

The Rotation Invariance Property

Let T be any k × k orthogonal matrix (T'T = TT' = I). Define a rotated loading matrix: $$\mathbf{\Lambda}^* = \mathbf{\Lambda T}$$

Then: $$\mathbf{\Lambda}^* (\mathbf{\Lambda}^*)^\top = \mathbf{\Lambda T T}^\top \mathbf{\Lambda}^\top = \mathbf{\Lambda \Lambda}^\top$$

The implied covariance Σ = ΛΛ' + Ψ is unchanged by rotations!

Why This Matters

This indeterminacy has profound implications:

No unique "true" solution: Infinitely many statistically equivalent factor structures exist
Interpretation depends on rotation: The same model can appear to measure different constructs depending on rotation choice
Rotation criteria become central: We need principled ways to choose among equivalent solutions

Philosophical Implication

Visualizing Rotation

Consider a 2-factor model. The loadings can be plotted in 2D space, where each point represents a variable. Any rotation of this plot around the origin produces an equally valid loading matrix:

Before rotation: Factor 1 might correlate with all variables moderately After rotation: Factor 1 might correlate strongly with half the variables and weakly with the rest

The data fit (likelihood, covariance reproduction) is identical—only interpretation changes.

Constraints to Resolve Indeterminacy

To obtain a unique solution, we impose constraints. Two approaches dominate:

1. During estimation: Constrain Λ'Ψ⁻¹Λ to be diagonal (as in maximum likelihood FA). This gives a unique computational solution but may not be interpretable.

2. After estimation: Apply rotation criteria that maximize interpretability (Simple Structure, discussed in the rotation module).

Mathematical Details: Degrees of Freedom Lost to Rotation

For k = 2 factors: 2(2-1)/2 = 1 rotation angle to fix For k = 3 factors: 3(3-1)/2 = 3 rotation angles to fix For k = 5 factors: 5(5-1)/2 = 10 rotation parameters to fix

The computational solution (e.g., from ML estimation) fixes these by convention; substantive rotation seeks a meaningful configuration.

Implications for Practice

•Always report rotation method: Results depend critically on the rotation used. Unrotated solutions are rarely interpretable.
•Compare multiple rotations: Different rotations can reveal different aspects of the data structure.
•Substantive justification: The choice of rotation should be guided by theory, not just statistical criteria.
•Replication across samples: If a factor structure is 'real', it should replicate with similar rotations across independent samples.

Key Assumptions Examined

Factor analysis rests on several assumptions, each with implications for when the technique is appropriate and how violations affect results.

Assumption 1: Linearity

The model assumes that observed variables are linear combinations of factors: $$x_i = \mu_i + \sum_{j=1}^{k} \lambda_{ij} z_j + \epsilon_i$$

Implications of violation: If the true relationship is nonlinear (e.g., quadratic, interaction effects), factor analysis will:

Produce suboptimal fits
May require more factors than the "true" number
Loadings won't correctly represent the underlying structure

Diagnostics: Plot residuals against factor scores; look for curvilinearity.

Assumption 2: Multivariate Normality

Classical (maximum likelihood) factor analysis assumes both z and ε are Gaussian, implying x is multivariate normal.

Role in estimation: The Gaussian assumption enables the likelihood function: $$\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{\Lambda \Lambda}^\top + \boldsymbol{\Psi})$$

Effects of violation:

Standard errors may be biased (typically underestimated)
Chi-square goodness-of-fit tests become unreliable
ML estimates remain consistent but lose efficiency

Remedies: Use robust standard errors, asymptotically distribution-free (ADF) estimation, or bootstrapping.

Testing Normality

For factor analysis, check: • Univariate normality of each variable (skewness, kurtosis) • Mardia's multivariate normality test • Visual: Q-Q plots, scatterplot matrices

Skewness > 2 or kurtosis > 7 typically indicate serious non-normality requiring alternative methods.

Assumption 3: Uncorrelated Unique Factors

The uniquenesses Ψ = diag(ψ₁, ..., ψₚ) form a diagonal matrix, meaning unique factors are mutually uncorrelated.

Interpretation: Any correlation between observed variables is entirely explained by common factors. If ε₁ and ε₂ are correlated, there's a "local dependence" not captured by the model.

Implications of violation:

Off-diagonal covariances are misattributed to common factors
May require additional factors to absorb the local dependence
Factor loadings become biased

When this assumption fails:

Longitudinal data with method effects (same question at different times)
Multi-trait multi-method designs
Items with shared wording or response format

Assumption 4: Independence of Factors and Unique Factors

z ⫫ ε: Latent factors are independent of unique factors.

Assumption 5: Correct Number of Factors

The model assumes we specify the correct k. Too few factors leads to:

Poor fit
Distorted remaining loadings

Too many factors leads to:

Overfitting
Difficult interpretation
Factors that don't replicate

Summary of Assumptions and Their Importance
Assumption	Tests/Diagnostics	Consequence of Violation	Remedies
Linearity	Residual plots, scatter plots	Biased loadings, inflated k	Transform variables, use nonlinear FA
Normality	Mardia's test, Q-Q plots	Invalid SEs and χ² tests	Robust SEs, ADF, bootstrap
Diagonal Ψ	Residual correlations	Biased loadings, wrong k	Allow correlated uniquenesses
z ⫫ ε	Theory-based	Theoretical concerns	Careful model specification
Correct k	Fit indices, parallel analysis	Mis-specified model	Model comparison, theory

The Philosophy of Common and Unique Variance

The Variance Decomposition

For each observed variable xᵢ: $$\text{Var}(x_i) = h_i^2 + \psi_i$$

where:

h²ᵢ = Σⱼ λ²ᵢⱼ: Communality—variance attributable to common factors
ψᵢ: Uniqueness—variance not shared with other variables

This decomposition has profound implications:

2. Factor Reliability Factors defined by variables with high communalities will be more reliable and generalizable than factors anchored by low-communality variables.

3. The "Unique Variance" Interpretation Unique variance combines two sources:

Specific variance: True variance in the construct that isn't shared with other measured variables
Error variance: Measurement unreliability

Without additional information (e.g., test-retest reliability), these cannot be separated.

The Communality Problem

High Communality (e.g., 0.7+)

•Variable is well-explained by common factors
•Strong marker for the factor(s)
•Reliable for scale construction
•Good for factor score estimation

Low Communality (e.g., < 0.3)

•Variable mostly measures something unique
•May indicate poor item quality
•Could be measuring a different construct
•Consider removing from analysis

The Common Factor Interpretation

The alternative instrumentalist view treats factors as useful fictions—computational devices for data reduction without claims about underlying reality.

Both views are legitimate, and the choice affects how we interpret and use factor solutions:

Realist: Factors are named and interpreted as real constructs; validated externally
Instrumentalist: Factors are tools; interpretation is secondary to prediction/compression

When to Use Factor Analysis

Factor analysis is a powerful technique, but it's not universally applicable. Understanding when it's appropriate—and when alternatives like PCA are preferable—is crucial for effective practice.

Factor Analysis is Appropriate When:

3. You're developing or validating measurement instruments FA is central to psychometrics, helping determine whether items measure hypothesized constructs and whether the measurement model fits.

4. You want to test a specific factor structure With Confirmatory Factor Analysis (CFA), you can specify and test hypothesized loading patterns—something PCA cannot do.

5. Correlations are your primary interest FA models the correlation/covariance structure directly; if you want to understand what drives correlations, FA provides that framework.

Choose Factor Analysis When:

•Testing a measurement model
•Developing psychological scales
•Separating signal from noise
•Interpreting latent constructs
•Correlations are effects of causes

Choose PCA When:

•Simple dimensionality reduction
•No measurement error model needed
•Components, not latent causes
•Preprocessing for other methods
•Maximum variance extraction

The PCA vs FA Debate

Sample Size Considerations

Factor analysis requires adequate sample size for:

Stable correlation/covariance estimates
Converged and proper solutions
Trustworthy standard errors and fit indices

Rules of thumb (increasingly refined over time):

Guideline	Recommendation	Notes
Absolute minimum	n ≥ 100	For exploratory FA
Subject-to-variable ratio	n/p ≥ 5-10	Classical rule
Communality-adjusted	Higher communalities → smaller n needed	Modern simulation-based
SEM/CFA guidelines	n ≥ 200 for typical models	More complex models need more

Summary: The Latent Factor Model

We have established the foundational framework of factor analysis. Let's consolidate the key concepts:

Key Takeaways

•Factor analysis is generative: It posits that observed data x = Λz + ε is generated by latent factors z plus unique noise ε. This is fundamentally different from PCA's descriptive approach.
•Factor loadings (Λ) are central: They quantify how each latent factor influences each observed variable. High loadings indicate strong influence; loading patterns reveal factor meaning.
•Uniquenesses (Ψ) model noise: The diagonal matrix Ψ captures variable-specific variance not explained by common factors—measurement error plus specific variance.
•The fundamental equation Σ = ΛΛ' + Ψ: This constrains the covariance structure; off-diagonal covariances arise entirely from common factors.
•Rotational indeterminacy is inherent: Infinitely many equivalent loading matrices exist. Rotation choices affect interpretation but not model fit.
•Strong assumptions underpin the model: Linearity, normality, uncorrelated uniquenesses, and factor-uniqueness independence. Violations have consequences.
•Common vs unique variance: FA explicitly separates shared (common) variance from variable-specific (unique) variance—a philosophically rich distinction.

What's Next

Conceptual Foundation Complete