Machine LearningDimensionality Reduction

Factor Analysis

LevelAdvanced

Duration90 mins

TopicDimensionality Reduction

3 / 5

Factor Rotation

Transforming Solutions into Meaning

You've extracted factors from your data using maximum likelihood or another estimation method. The solution converges, the fit statistics look acceptable, and you have a loading matrix. But when you examine the loadings, every variable seems to load moderately on every factor—there's no clear pattern to interpret.

This is precisely where factor rotation becomes essential. Rotation is the process of transforming an initial factor solution into one that is more interpretable while preserving the statistical properties of the solution. It exploits the rotational indeterminacy of factor models: infinitely many loading matrices produce identical fits to the data, so we're free to choose one that makes substantive sense.

Rotation is not a statistical nicety—it is fundamental to factor analysis practice. Unrotated solutions are almost never directly interpretable. The art of factor analysis lies substantially in choosing and applying rotations that reveal the underlying structure most clearly.

This page covers the theory and practice of factor rotation: what it means mathematically, why we need it, and how to choose among the many rotation methods available.

What You Will Learn

By the end of this page, you will understand: • Why rotation is necessary and what it mathematically accomplishes • The concept of "simple structure" that guides rotation criteria • Orthogonal rotations: varimax, quartimax, equamax, and their properties • Oblique rotations: promax, oblimin, and when correlated factors are appropriate • How to interpret factor correlation matrices in oblique solutions • Practical guidance for choosing rotation methods • Common pitfalls and best practices in rotation

Why Rotation is Necessary

The Initial Solution Problem

Maximum likelihood (or other) estimation produces a unique factor solution—but this uniqueness comes from mathematical convenience, not interpretability. The standard ML solution constrains Λ'Ψ⁻¹Λ to be diagonal, which yields a unique solution but produces loadings that:

Are ordered by variance explained: The first factor accounts for the most variance
Have most variables loading on the first factor: A "general factor" pattern
Are difficult to interpret substantively: No clear clustering of variables

This happens because the estimation algorithm optimizes statistical criteria, not psychological or substantive clarity.

The Rotational Indeterminacy Reminder

Recall that if T is any orthogonal matrix (T'T = I), then:

$$\mathbf{\Lambda}^* = \mathbf{\Lambda T}$$

produces identical fit:

$$\mathbf{\Lambda}^* (\mathbf{\Lambda}^*)' = \mathbf{\Lambda T T'\Lambda}' = \mathbf{\Lambda\Lambda}'$$

This means the covariance structure—the only thing observable from data—is preserved under rotation. The rotation matrix T has k(k-1)/2 free parameters (rotation angles), all of which can be set arbitrarily without changing model fit.

Rotation exploits this freedom to find a solution that is interpretable.

What Rotation Changes and Preserves

Preserved by rotation: • Model fit (χ², RMSEA, etc.) • Reproduced covariance matrix ΛΛ' • Total variance explained (communalities unchanged) • Factor scores' relationships with observed variables

Changed by rotation: • Individual loading values (pattern matrix) • Factor interpretations • Apparent "importance" of factors • Factor correlations (if oblique rotation)

Geometric Intuition

Consider a 2-factor solution plotted in 2D, where each variable is a point based on its loadings on Factor 1 (x-axis) and Factor 2 (y-axis):

Before rotation: Variables cluster in the space but not along the axes. Both factors correlate with all variables to varying degrees.

After rotation: Axes are repositioned so that variable clusters fall along the axes. Each factor now clearly corresponds to a distinct group of variables.

The data points (variables in factor space) don't move—only the axes rotate. But by aligning axes with clusters, factor interpretation becomes natural:

Variables near the x-axis load on Factor 1
Variables near the y-axis load on Factor 2
Variables near the origin load weakly on both

This is the essence of simple structure.

Goals of Rotation

•Maximize interpretability: Each factor should be defined by a distinct subset of variables
•Achieve simple structure: Variables load highly on one factor and minimally on others
•Reduce complexity: Fewer cross-loadings makes naming factors easier
•Enable comparison: Rotated solutions from different samples are more comparable
•Reveal 'true' structure: If a simple structure exists in reality, rotation should find it

Simple Structure: Thurstone's Principles

The concept of simple structure was formalized by L.L. Thurstone in the 1930s and 1940s. It provides the theoretical justification for rotation and the criteria by which rotations are evaluated.

Thurstone's Five Criteria

Thurstone proposed that an ideal factor solution should exhibit:

Each row should have at least one near-zero loading: Every variable should be unrelated to at least one factor
Each column should have at least k-1 near-zero loadings: For k factors, each factor should have some variables with near-zero loadings (others don't load on it)
For every pair of columns, several variables should have near-zero loadings in both: Ensures factors are distinct
For orthogonal factors: several variables have near-zero loadings in one column and substantial loadings in another: Clear differentiation
For every pair of columns, few variables should have substantial loadings in both: Minimizes cross-loadings

Modern Interpretation

These principles translate to a cleaner goal: each variable should load highly on exactly one factor and have near-zero loadings on all others.

A perfect simple structure loading matrix would look like:

$$\mathbf{\Lambda} = \begin{pmatrix} \bullet & 0 & 0 \ \bullet & 0 & 0 \ 0 & \bullet & 0 \ 0 & \bullet & 0 \ 0 & 0 & \bullet \ 0 & 0 & \bullet \end{pmatrix}$$

where • represents a substantial loading and 0 represents near-zero.

This is called a cluster structure: variables cluster into groups, each defining one factor.

Why Simple Structure Matters

Simple structure isn't just aesthetic—it has scientific value: • Replicability: Simple structures replicate more consistently across samples • Interpretation: Factors can be named based on loading clusters • Parsimony: The simplest explanation consistent with data • External validity: Factors with simple structure tend to correlate more consistently with external criteria

Quantifying Simple Structure

Rotation algorithms optimize criteria that quantify how "simple" a loading matrix is. The general approach:

Simplicity of a loading matrix = Function that is maximized when:

Variance of squared loadings is high
Loadings are pushed toward 0 or ±1 (not moderate values)
Few cross-loadings exist

Different rotation methods define this differently, but all attempt to achieve simple structure.

Limitations of Simple Structure

Not all data have simple structure. In some domains:

Variables genuinely measure multiple constructs: A personality item might reflect both extroversion and agreeableness
Hierarchical structures exist: A general factor influences all variables, plus group factors
Factors are inherently correlated: Orthogonal simple structure may be impossible

When simple structure doesn't fit, forcing it through rotation can distort reality. Recognition of genuine complexity is important.

Characteristics of Simple vs Complex Structure
Characteristic	Simple Structure	Complex Structure
Cross-loadings	Few and small	Many and substantial
Variable clustering	Clear groups	Overlapping memberships
Factor interpretation	Straightforward	Nuanced, multi-faceted
Replication	Stable across samples	May vary
Example domain	Primary mental abilities	Personality structure
Rotation benefit	High—reveals structure	Moderate—may distort

Orthogonal Rotations

Orthogonal rotations preserve the uncorrelatedness of factors. The rotation matrix T satisfies T'T = I, meaning factors remain at 90° angles after rotation. This maintains interpretive simplicity: factors represent independent dimensions.

Varimax Rotation

Varimax is the most widely used rotation method, introduced by Kaiser (1958). It maximizes the variance of squared loadings within each factor (column).

Objective: $$\text{Varimax: Maximize } V = \sum_{j=1}^{k} \left[ \frac{1}{p} \sum_{i=1}^{p} \tilde{\lambda}{ij}^4 - \left( \frac{1}{p} \sum{i=1}^{p} \tilde{\lambda}_{ij}^2 \right)^2 \right]$$

where $\tilde{\lambda}{ij} = \lambda{ij} / h_i$ are communality-normalized loadings.

Effect:

Pushes loadings toward 0 or ±1 within each column
Each factor has a few high loadings and many low loadings
Simplifies columns (factors)

Properties:

Tends to spread variance more evenly across factors
Good for identifying distinct, clean factors
The "default" choice in most software

Kaiser Normalization

Before varimax rotation, loadings are typically normalized by dividing by the square root of communality: λ̃ᵢⱼ = λᵢⱼ / hᵢ. This gives equal weight to variables regardless of communality. After rotation, the normalization is reversed. This is called Kaiser normalization and is standard practice.

Quartimax Rotation

Quartimax maximizes the variance of squared loadings across all loadings (not column-wise). It simplifies rows rather than columns.

Objective: $$\text{Quartimax: Maximize } Q = \sum_{i=1}^{p} \sum_{j=1}^{k} \lambda_{ij}^4$$

Effect:

Each variable tends to load highly on one factor only
Simplifies rows (variables)
Tends to produce a large general factor with most variables loading on it

Properties:

Useful when a general factor is expected (e.g., general intelligence, g)
Less common than varimax because it often doesn't separate factors well
First factor absorbs more variance than in varimax

Equamax Rotation

Equamax is a compromise between varimax and quartimax, weighting both column and row simplicity.

Objective: A weighted combination of varimax and quartimax criteria: $$\text{Equamax: } k V + Q$$

where V is the varimax criterion and Q is the quartimax criterion.

Effect:

Balances factor and variable simplicity
Less extreme than either varimax or quartimax

Properties:

Rarely used in practice; varimax dominates
May be useful when both row and column simplicity are important

Comparison of Orthogonal Rotation Methods
Method	Simplifies	Best When	Typical Result
Varimax	Columns (factors)	Distinct, independent factors expected	Even variance across factors
Quartimax	Rows (variables)	General factor + group factors	One large factor, others smaller
Equamax	Both (balanced)	Compromise needed	Between varimax and quartimax

orthogonal_rotation_demo.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
import numpy as np
from scipy import linalg
 
def varimax_rotation(loadings, gamma=1.0, max_iter=500, tol=1e-6):
    """
    Apply varimax (gamma=1) or quartimax (gamma=0) rotation.
    gamma=1: Varimax (column-wise simplicity)
    gamma=0: Quartimax (row-wise simplicity)
    
    Parameters
    ----------
    loadings : ndarray of shape (p, k)
        Initial loading matrix
    gamma : float
        Weight between varimax (1) and quartimax (0)
    
    Returns
    -------
    rotated_loadings : ndarray of shape (p, k)
        Rotated loading matrix
    rotation_matrix : ndarray of shape (k, k)
        The orthogonal rotation matrix T
    """
    p, k = loadings.shape
    
    # Kaiser normalization: normalize rows by communality
    h_sq = np.sum(loadings**2, axis=1)
    h = np.sqrt(h_sq)
    h[h == 0] = 1  # Avoid division by zero
    normalized = loadings / h[:, np.newaxis]
    
    # Initialize rotation matrix as identity
    T = np.eye(k)
    
    for iteration in range(max_iter):
        # Current rotated loadings
        L = normalized @ T
        
        # Compute rotation update (simplified varimax algorithm)
        # Based on sum of squared loadings variance
        L_sq = L ** 2
        
        # For each pair of factors, find optimal rotation angle
        converged = True
        for i in range(k-1):
            for j in range(i+1, k):
                # Extract 2-factor subproblem
                x, y = L[:, i], L[:, j]
                
                # Compute rotation angle that maximizes varimax criterion
                # Using the standard formula
                u = x**2 - y**2
                v = 2 * x * y
                
                A = np.sum(u)
                B = np.sum(v)
                C = np.sum(u**2 - v**2)
                D = 2 * np.sum(u * v)
                
                # Optimal angle
                num = D - 2 * gamma * A * B / p
                denom = C - gamma * (A**2 - B**2) / p
                
                phi = 0.25 * np.arctan2(num, denom)
                
                if np.abs(phi) > tol:
                    converged = False
                    # Apply rotation to columns i, j
                    c, s = np.cos(phi), np.sin(phi)
                    T_ij = T[:, [i, j]] @ np.array([[c, -s], [s, c]])
                    T[:, i] = T_ij[:, 0]
                    T[:, j] = T_ij[:, 1]
                    L[:, i] = c * L[:, i] + s * L[:, j]
                    L[:, j] = -s * (L[:, i] - s * L[:, j]) / c + c * L[:, j]
        
        if converged:
            break
    
    # De-normalize
    rotated = (normalized @ T) * h[:, np.newaxis]
    
    return rotated, T
 
# Example usage
np.random.seed(42)
 
# Simulate unrotated loadings (complex pattern)
unrotated = np.array([
    [0.7, 0.4],
    [0.6, 0.5],
    [0.65, 0.45],
    [0.4, 0.7],
    [0.5, 0.6],
    [0.45, 0.65],
])
 
print("Unrotated Loadings:")
print(np.round(unrotated, 3))
 
rotated_varimax, T_varimax = varimax_rotation(unrotated, gamma=1.0)
print("
Varimax Rotated Loadings:")
print(np.round(rotated_varimax, 3))
 
rotated_quartimax, T_quartimax = varimax_rotation(unrotated, gamma=0.0)
print("
Quartimax Rotated Loadings:")
print(np.round(rotated_quartimax, 3))
 
print("
Varimax spreads variance; quartimax produces a general factor.")

Oblique Rotations: Correlated Factors

Orthogonal rotations constrain factors to be uncorrelated. But in many domains, underlying factors are expected to correlate:

Cognitive abilities intercorrelate (verbal, quantitative, spatial)
Personality traits intercorrelate (neuroticism associates with introversion)
Attitudes on related topics intercorrelate

Oblique rotations allow factors to correlate, providing greater flexibility to achieve simple structure at the cost of interpretive complexity.

The Pattern and Structure Matrices

With oblique rotation, we must distinguish two matrices:

Pattern Matrix (P): Partial regression weights—each loading represents the unique relationship between a variable and a factor, controlling for other factors.

Structure Matrix (S): Correlations between variables and factors—includes both direct and indirect effects (through factor correlations).

The relationship: $$\mathbf{S} = \mathbf{P} \boldsymbol{\Phi}$$

where Φ is the factor correlation matrix.

For orthogonal rotation: P = S (no distinction needed). For oblique rotation: P ≠ S; both are interpretively relevant.

Pattern vs Structure: Which to Interpret?

This is a common source of confusion: • Pattern matrix shows unique effects—use this for determining which variables "belong to" which factor • Structure matrix shows total correlations—use this for understanding variable-factor relationships • When factors correlate highly, pattern and structure can differ substantially • Always examine the factor correlation matrix alongside pattern loadings

Direct Oblimin

Direct Oblimin is the most common oblique rotation. It minimizes cross-products of loadings, pushing them toward simple structure while allowing factor correlation.

Objective: Minimize a function that penalizes cross-loadings: $$\text{Oblimin: Minimize } \sum_{j eq k} \sum_{i} \tilde{\lambda}{ij}^2 \tilde{\lambda}{ik}^2 - \frac{\delta}{p} (\sum_i \tilde{\lambda}{ij}^2)(\sum_i \tilde{\lambda}{ik}^2)$$

The delta (δ) parameter controls obliquity:

δ = 0: Quartimin—minimal correlation (most orthogonal-like)
δ < 0: Increases allowed factor correlation
δ = -0.5: Common default (covarimin)

Properties:

Flexible in allowing factor correlations
Can produce very simple pattern matrices
Requires interpretation of factor correlations

Promax Rotation

Promax is a two-step procedure:

Compute a varimax rotation
"Sharpen" the loadings by raising them to a power, then find the oblique rotation closest to this target

Procedure:

Get varimax loadings L
Create target: P = |L|^κ × sign(L), where κ > 1 sharpens loadings
Find oblique rotation closest to P

The kappa (κ) parameter:

κ = 2: Mild sharpening, moderate factor correlations
κ = 4: Strong sharpening, potentially high factor correlations (default in many programs)

Properties:

Computationally fast (no iteration after varimax)
Popular in applied research
Can produce too-high factor correlations if κ is large

Comparison of Oblique Rotation Methods
Method	Control Parameter	Factor Correlations	Computational
Direct Oblimin (δ=0)	Delta (δ)	Tends to be small	Iterative
Oblimin (δ=-0.5)	Delta (δ)	Moderate correlation allowed	Iterative
Promax (κ=4)	Kappa (κ)	Can become large	Non-iterative
Geomin	Epsilon (ε)	Typically moderate	Iterative

When to Use Oblique Rotation

Use oblique when:

Theory suggests factors should correlate (most psychological constructs)
Orthogonal rotation produces poor simple structure
You plan to model higher-order factors (which require correlated first-order factors)
Cross-loadings persist with orthogonal rotation

Stick with orthogonal when:

Factors are conceptually independent
Simplicity of interpretation is paramount
Sample size is small (oblique is less stable)
You need uncorrelated factor scores for downstream analysis

Practical advice: Many methodologists now recommend starting with oblique rotation as the default. If factors are truly uncorrelated, oblique rotation will show near-zero correlations. If they're correlated, oblique rotation reveals this; orthogonal rotation hides it.

The Oblique Default

Modern practice: Use oblique rotation (e.g., oblimin with δ=0) as your default. If the resulting factor correlation matrix shows |r| < 0.10 for all pairs, the orthogonal assumption is supported and you can report varimax for simplicity. If factor correlations are substantial, report the oblique solution.

Interpreting Oblique Solutions

Oblique solutions require more careful interpretation than orthogonal solutions. Here's a systematic approach.

Step 1: Examine the Factor Correlation Matrix (Φ)

Before interpreting loadings, examine how factors relate:

$$\boldsymbol{\Phi} = \begin{pmatrix} 1.00 & 0.35 & 0.28 \ 0.35 & 1.00 & 0.42 \ 0.28 & 0.42 & 1.00 \end{pmatrix}$$

Interpretation:

Diagonal is always 1 (standardized factors)
Off-diagonal shows factor intercorrelations
High correlations (> 0.7) suggest factors might be one construct
Moderate correlations (0.3-0.5) are typical for related constructs

Step 2: Interpret the Pattern Matrix

The pattern matrix shows unique contributions:

Variable	Factor 1	Factor 2	Factor 3
Var1	0.82	0.05	-0.03
Var2	0.75	-0.08	0.10
Var3	0.02	0.79	0.04
Var4	0.06	0.71	0.08
Var5	-0.05	0.03	0.85
Var6	0.08	0.09	0.73

This shows a clean simple structure in the pattern matrix, despite factor correlations.

Step 3: Compare Pattern and Structure Matrices

Structure matrix entries will be larger due to indirect effects:

Variable	Pattern (F1)	Structure (F1)
Var1	0.82	0.85
Var3	0.02	0.30

Var3's structure loading on F1 (0.30) exceeds its pattern loading (0.02) because F1 correlates with F2, and Var3 loads on F2.

Guidelines for Oblique Interpretation

•Assign variables based on pattern matrix: A variable 'belongs to' the factor with its highest pattern loading
•Note salient pattern loadings: Use pattern matrix to identify factor markers (usually > 0.4)
•Use structure for understanding correlations: The structure matrix shows total variable-factor association
•Report factor correlations: Always report Φ so readers understand factor relationships
•Consider higher-order factors: If factor correlations are high (> 0.5), a higher-order analysis may be warranted

High Factor Correlations: A Warning Sign?

Factor correlations > 0.85 suggest the factors may not be distinct. Possible explanations: • Too many factors extracted • Factors are facets of the same construct • Sample-specific artifact

If factor correlations are very high, consider extracting fewer factors or re-examining your conceptual model.

Reporting Oblique Results

A complete report of an oblique solution includes:

Estimation method and rotation: "Factors were extracted using ML and rotated with oblimin (δ = 0)"
Pattern matrix with salient loadings highlighted
Factor correlation matrix
Communalities
Variance explained (note: doesn't sum to total due to correlated factors)
Interpretation of each factor based on pattern loadings

Table format suggestion:

Variable	F1	F2	F3	h²
Item 1	.78	.05	.02	.65
...	...	...	...	...

Factor Correlations: F1-F2: .35; F1-F3: .28; F2-F3: .42

Practical Guide to Choosing Rotation

With multiple rotation options available, how do you choose? Here's a practical decision framework.

Decision Tree

Question 1: Are factors expected to be correlated based on theory?

Yes → Use oblique rotation (oblimin or promax)
No → Use orthogonal rotation (varimax)
Unsure → Use oblique and examine Φ; if correlations are small, orthogonal gives similar results

Question 2: Is a general factor expected?

Yes → Consider quartimax (or bi-factor models, beyond this scope)
No → Varimax or oblimin

Question 3: What is your software's default?

Many programs default to varimax; explicitly request your chosen rotation
Understand what δ or κ your software uses for oblimin/promax

Use Varimax When:

•Factors are theoretically independent
•You need uncorrelated factor scores
•Simplicity of interpretation is priority
•You're replicating prior orthogonal work
•Sample is small (<200)

Use Oblimin When:

•Factors might correlate (most psychological domains)
•Varimax produces poor simple structure
•You want to test whether factors correlate
•You plan higher-order factor analysis
•Sample is adequate (>250)

Comparing Multiple Rotations

Best practice: Try both orthogonal (varimax) and oblique (oblimin) rotations. Compare:

Loading patterns (should be similar if structure is clear)
Interpretability
Factor correlations in oblique (if small, orthogonal is fine)

If different rotations give substantially different stories, the factor structure may be unclear, requiring:

More data
Revised variable selection
Different number of factors

Software Considerations

Software	Default Rotation	Oblique Available	Notes
SPSS	Varimax	Direct Oblimin, Promax	Reports pattern and structure
SAS (PROC FACTOR)	Varimax	Promax, HK, others	Extensive options
R (psych)	Varimax	Oblimin, Promax, others	Very flexible
Python (sklearn)	None	Not built-in	Use external rotation
Mplus	Geomin	Multiple	CFA-focused but EFA available

Common Pitfalls

Mistakes to Avoid

•Using software defaults without thought: Understand what rotation is being applied
•Interpreting unrotated solutions: Almost never interpretable; always rotate
•Confusing pattern and structure in oblique: Know which you're looking at
•Ignoring factor correlations: Report and interpret Φ
•Forcing orthogonal when factors correlate: Distorts the structure
•Over-rotating: Excessive promax κ can create artificial correlations

Advanced Rotation Topics

Target Rotation (Procrustes)

When you have a hypothesized or prior loading pattern, target rotation rotates toward a specified target matrix:

$$\text{Minimize } ||\mathbf{\Lambda T} - \mathbf{P}||^2$$

where P is your target.

Uses:

Comparing factor structures across groups
Replicating a prior solution in new data
Confirmatory element in exploratory analysis

Partially specified targets: Specify only some target loadings (e.g., zeros for cross-loadings) and let others be free.

Geomin Rotation

Popular in structural equation modeling software (Mplus), geomin minimizes a complexity function:

$$\text{Geomin: Minimize } \sum_i \left( \prod_j (\lambda_{ij}^2 + \epsilon) \right)^{1/k}$$

Geomin tends to produce:

Small, nonzero loadings rather than exact zeros
More robust to local minima
Used as default in Mplus EFA

Bifactor Rotations

When a general factor plus group factors are hypothesized:

Schmid-Leiman transformation: Transforms a higher-order solution into a bifactor form
Bifactor rotation criteria: Directly rotate to bifactor structure

Bifactor models are increasingly popular in psychometrics for understanding general vs. specific variance.

Rotation and Confirmatory Factor Analysis

In Confirmatory Factor Analysis (CFA), the loading pattern is specified a priori:

No rotation is needed; the model is identified by constraints
Cross-loadings are fixed to zero by specification
Model fit tests whether the specified structure holds

EFA with rotation is exploratory; CFA is confirmatory. But target rotation in EFA can bridge the two approaches.

The EFA-CFA Pipeline

A common workflow:

Conduct EFA with rotation on Sample A
Interpret the rotated solution to form hypotheses
Specify a CFA model based on EFA results
Test the CFA model on Sample B (cross-validation)

This uses EFA's flexibility for discovery and CFA's rigor for confirmation.

Local Minima in Rotation

Iterative rotation algorithms (oblimin, varimax) can converge to local minima, especially with:

Many factors (k > 4)
Complex data with no clear structure
Small samples

Solutions:

Run rotation from multiple starting values
Compare solutions across different methods
Use global optimization methods (simulated annealing, genetic algorithms—available in some software)

Non-Converging Rotations

Rarely, rotation fails to converge. Causes include:

Near-singular matrices
Very small sample relative to parameters
Ill-defined factor structure

Try: different rotation method, different number of factors, or examine data for anomalies.

Summary: Factor Rotation

Factor rotation is a critical step in making factor analysis results interpretable. Let's consolidate our understanding:

Key Takeaways

•Rotation exploits indeterminacy: Infinitely many equivalent solutions exist; rotation chooses an interpretable one.
•Simple structure is the goal: Variables should load highly on one factor and near-zero on others.
•Varimax is the orthogonal default: Maximizes variance of squared loadings within columns; produces distinct factors.
•Oblique rotations allow factor correlation: Oblimin and promax are common; pattern matrix shows unique effects, structure shows total correlations.
•Pattern ≠ Structure in oblique solutions: Understand and report both, plus the factor correlation matrix.
•Modern recommendation: Use oblique as default; if Φ shows near-zero correlations, orthogonal is fine.
•Rotation choice affects interpretation: Always report which rotation you used and why.
•Multiple rotations can be compared: If different rotations give very different patterns, the structure may be unclear.

What's Next

With our understanding of factor rotation complete, the next page covers Maximum Likelihood Estimation for factor analysis—how parameters are estimated from data, the log-likelihood function, fit assessment, and standard errors for loadings.

Rotation Mastery Achieved

You now understand both the theory and practice of factor rotation. Remember: rotation is not a statistical afterthought—it's fundamental to making factor analysis results meaningful. Choose rotation methods deliberately and report them transparently.

3 / 5

Loading learning content...

Machine LearningDimensionality Reduction

Factor Analysis

LevelAdvanced

Duration90 mins

TopicDimensionality Reduction

3 / 5

Factor Rotation

Transforming Solutions into Meaning

This page covers the theory and practice of factor rotation: what it means mathematically, why we need it, and how to choose among the many rotation methods available.

What You Will Learn

Why Rotation is Necessary

The Initial Solution Problem

Are ordered by variance explained: The first factor accounts for the most variance
Have most variables loading on the first factor: A "general factor" pattern
Are difficult to interpret substantively: No clear clustering of variables

This happens because the estimation algorithm optimizes statistical criteria, not psychological or substantive clarity.

The Rotational Indeterminacy Reminder

Recall that if T is any orthogonal matrix (T'T = I), then:

$$\mathbf{\Lambda}^* = \mathbf{\Lambda T}$$

produces identical fit:

$$\mathbf{\Lambda}^* (\mathbf{\Lambda}^*)' = \mathbf{\Lambda T T'\Lambda}' = \mathbf{\Lambda\Lambda}'$$

Rotation exploits this freedom to find a solution that is interpretable.

What Rotation Changes and Preserves

Changed by rotation: • Individual loading values (pattern matrix) • Factor interpretations • Apparent "importance" of factors • Factor correlations (if oblique rotation)

Geometric Intuition

Consider a 2-factor solution plotted in 2D, where each variable is a point based on its loadings on Factor 1 (x-axis) and Factor 2 (y-axis):

Before rotation: Variables cluster in the space but not along the axes. Both factors correlate with all variables to varying degrees.

After rotation: Axes are repositioned so that variable clusters fall along the axes. Each factor now clearly corresponds to a distinct group of variables.

The data points (variables in factor space) don't move—only the axes rotate. But by aligning axes with clusters, factor interpretation becomes natural:

Variables near the x-axis load on Factor 1
Variables near the y-axis load on Factor 2
Variables near the origin load weakly on both

This is the essence of simple structure.

Goals of Rotation

•Maximize interpretability: Each factor should be defined by a distinct subset of variables
•Achieve simple structure: Variables load highly on one factor and minimally on others
•Reduce complexity: Fewer cross-loadings makes naming factors easier
•Enable comparison: Rotated solutions from different samples are more comparable
•Reveal 'true' structure: If a simple structure exists in reality, rotation should find it

Simple Structure: Thurstone's Principles

The concept of simple structure was formalized by L.L. Thurstone in the 1930s and 1940s. It provides the theoretical justification for rotation and the criteria by which rotations are evaluated.

Thurstone's Five Criteria

Thurstone proposed that an ideal factor solution should exhibit:

Each row should have at least one near-zero loading: Every variable should be unrelated to at least one factor
Each column should have at least k-1 near-zero loadings: For k factors, each factor should have some variables with near-zero loadings (others don't load on it)
For every pair of columns, several variables should have near-zero loadings in both: Ensures factors are distinct
For orthogonal factors: several variables have near-zero loadings in one column and substantial loadings in another: Clear differentiation
For every pair of columns, few variables should have substantial loadings in both: Minimizes cross-loadings

Modern Interpretation

These principles translate to a cleaner goal: each variable should load highly on exactly one factor and have near-zero loadings on all others.

A perfect simple structure loading matrix would look like:

$$\mathbf{\Lambda} = \begin{pmatrix} \bullet & 0 & 0 \ \bullet & 0 & 0 \ 0 & \bullet & 0 \ 0 & \bullet & 0 \ 0 & 0 & \bullet \ 0 & 0 & \bullet \end{pmatrix}$$

where • represents a substantial loading and 0 represents near-zero.

This is called a cluster structure: variables cluster into groups, each defining one factor.

Why Simple Structure Matters

Quantifying Simple Structure

Rotation algorithms optimize criteria that quantify how "simple" a loading matrix is. The general approach:

Simplicity of a loading matrix = Function that is maximized when:

Variance of squared loadings is high
Loadings are pushed toward 0 or ±1 (not moderate values)
Few cross-loadings exist

Different rotation methods define this differently, but all attempt to achieve simple structure.

Limitations of Simple Structure

Not all data have simple structure. In some domains:

Variables genuinely measure multiple constructs: A personality item might reflect both extroversion and agreeableness
Hierarchical structures exist: A general factor influences all variables, plus group factors
Factors are inherently correlated: Orthogonal simple structure may be impossible

When simple structure doesn't fit, forcing it through rotation can distort reality. Recognition of genuine complexity is important.

Characteristics of Simple vs Complex Structure
Characteristic	Simple Structure	Complex Structure
Cross-loadings	Few and small	Many and substantial
Variable clustering	Clear groups	Overlapping memberships
Factor interpretation	Straightforward	Nuanced, multi-faceted
Replication	Stable across samples	May vary
Example domain	Primary mental abilities	Personality structure
Rotation benefit	High—reveals structure	Moderate—may distort

Orthogonal Rotations

Varimax Rotation

Varimax is the most widely used rotation method, introduced by Kaiser (1958). It maximizes the variance of squared loadings within each factor (column).

Objective: $$\text{Varimax: Maximize } V = \sum_{j=1}^{k} \left[ \frac{1}{p} \sum_{i=1}^{p} \tilde{\lambda}{ij}^4 - \left( \frac{1}{p} \sum{i=1}^{p} \tilde{\lambda}_{ij}^2 \right)^2 \right]$$

where $\tilde{\lambda}{ij} = \lambda{ij} / h_i$ are communality-normalized loadings.

Effect:

Pushes loadings toward 0 or ±1 within each column
Each factor has a few high loadings and many low loadings
Simplifies columns (factors)

Properties:

Tends to spread variance more evenly across factors
Good for identifying distinct, clean factors
The "default" choice in most software

Kaiser Normalization

Quartimax Rotation

Quartimax maximizes the variance of squared loadings across all loadings (not column-wise). It simplifies rows rather than columns.

Objective: $$\text{Quartimax: Maximize } Q = \sum_{i=1}^{p} \sum_{j=1}^{k} \lambda_{ij}^4$$

Effect:

Each variable tends to load highly on one factor only
Simplifies rows (variables)
Tends to produce a large general factor with most variables loading on it

Properties:

Useful when a general factor is expected (e.g., general intelligence, g)
Less common than varimax because it often doesn't separate factors well
First factor absorbs more variance than in varimax

Equamax Rotation

Equamax is a compromise between varimax and quartimax, weighting both column and row simplicity.

Objective: A weighted combination of varimax and quartimax criteria: $$\text{Equamax: } k V + Q$$

where V is the varimax criterion and Q is the quartimax criterion.

Effect:

Balances factor and variable simplicity
Less extreme than either varimax or quartimax

Properties:

Rarely used in practice; varimax dominates
May be useful when both row and column simplicity are important

Comparison of Orthogonal Rotation Methods
Method	Simplifies	Best When	Typical Result
Varimax	Columns (factors)	Distinct, independent factors expected	Even variance across factors
Quartimax	Rows (variables)	General factor + group factors	One large factor, others smaller
Equamax	Both (balanced)	Compromise needed	Between varimax and quartimax

orthogonal_rotation_demo.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
import numpy as np
from scipy import linalg
 
def varimax_rotation(loadings, gamma=1.0, max_iter=500, tol=1e-6):
    """
    Apply varimax (gamma=1) or quartimax (gamma=0) rotation.
    gamma=1: Varimax (column-wise simplicity)
    gamma=0: Quartimax (row-wise simplicity)
    
    Parameters
    ----------
    loadings : ndarray of shape (p, k)
        Initial loading matrix
    gamma : float
        Weight between varimax (1) and quartimax (0)
    
    Returns
    -------
    rotated_loadings : ndarray of shape (p, k)
        Rotated loading matrix
    rotation_matrix : ndarray of shape (k, k)
        The orthogonal rotation matrix T
    """
    p, k = loadings.shape
    
    # Kaiser normalization: normalize rows by communality
    h_sq = np.sum(loadings**2, axis=1)
    h = np.sqrt(h_sq)
    h[h == 0] = 1  # Avoid division by zero
    normalized = loadings / h[:, np.newaxis]
    
    # Initialize rotation matrix as identity
    T = np.eye(k)
    
    for iteration in range(max_iter):
        # Current rotated loadings
        L = normalized @ T
        
        # Compute rotation update (simplified varimax algorithm)
        # Based on sum of squared loadings variance
        L_sq = L ** 2
        
        # For each pair of factors, find optimal rotation angle
        converged = True
        for i in range(k-1):
            for j in range(i+1, k):
                # Extract 2-factor subproblem
                x, y = L[:, i], L[:, j]
                
                # Compute rotation angle that maximizes varimax criterion
                # Using the standard formula
                u = x**2 - y**2
                v = 2 * x * y
                
                A = np.sum(u)
                B = np.sum(v)
                C = np.sum(u**2 - v**2)
                D = 2 * np.sum(u * v)
                
                # Optimal angle
                num = D - 2 * gamma * A * B / p
                denom = C - gamma * (A**2 - B**2) / p
                
                phi = 0.25 * np.arctan2(num, denom)
                
                if np.abs(phi) > tol:
                    converged = False
                    # Apply rotation to columns i, j
                    c, s = np.cos(phi), np.sin(phi)
                    T_ij = T[:, [i, j]] @ np.array([[c, -s], [s, c]])
                    T[:, i] = T_ij[:, 0]
                    T[:, j] = T_ij[:, 1]
                    L[:, i] = c * L[:, i] + s * L[:, j]
                    L[:, j] = -s * (L[:, i] - s * L[:, j]) / c + c * L[:, j]
        
        if converged:
            break
    
    # De-normalize
    rotated = (normalized @ T) * h[:, np.newaxis]
    
    return rotated, T
 
# Example usage
np.random.seed(42)
 
# Simulate unrotated loadings (complex pattern)
unrotated = np.array([
    [0.7, 0.4],
    [0.6, 0.5],
    [0.65, 0.45],
    [0.4, 0.7],
    [0.5, 0.6],
    [0.45, 0.65],
])
 
print("Unrotated Loadings:")
print(np.round(unrotated, 3))
 
rotated_varimax, T_varimax = varimax_rotation(unrotated, gamma=1.0)
print("
Varimax Rotated Loadings:")
print(np.round(rotated_varimax, 3))
 
rotated_quartimax, T_quartimax = varimax_rotation(unrotated, gamma=0.0)
print("
Quartimax Rotated Loadings:")
print(np.round(rotated_quartimax, 3))
 
print("
Varimax spreads variance; quartimax produces a general factor.")

Oblique Rotations: Correlated Factors

Orthogonal rotations constrain factors to be uncorrelated. But in many domains, underlying factors are expected to correlate:

Cognitive abilities intercorrelate (verbal, quantitative, spatial)
Personality traits intercorrelate (neuroticism associates with introversion)
Attitudes on related topics intercorrelate

Oblique rotations allow factors to correlate, providing greater flexibility to achieve simple structure at the cost of interpretive complexity.

The Pattern and Structure Matrices

With oblique rotation, we must distinguish two matrices:

Pattern Matrix (P): Partial regression weights—each loading represents the unique relationship between a variable and a factor, controlling for other factors.

Structure Matrix (S): Correlations between variables and factors—includes both direct and indirect effects (through factor correlations).

The relationship: $$\mathbf{S} = \mathbf{P} \boldsymbol{\Phi}$$

where Φ is the factor correlation matrix.

For orthogonal rotation: P = S (no distinction needed). For oblique rotation: P ≠ S; both are interpretively relevant.

Pattern vs Structure: Which to Interpret?

Direct Oblimin

Direct Oblimin is the most common oblique rotation. It minimizes cross-products of loadings, pushing them toward simple structure while allowing factor correlation.

The delta (δ) parameter controls obliquity:

δ = 0: Quartimin—minimal correlation (most orthogonal-like)
δ < 0: Increases allowed factor correlation
δ = -0.5: Common default (covarimin)

Properties:

Flexible in allowing factor correlations
Can produce very simple pattern matrices
Requires interpretation of factor correlations

Promax Rotation

Promax is a two-step procedure:

Compute a varimax rotation
"Sharpen" the loadings by raising them to a power, then find the oblique rotation closest to this target

Procedure:

Get varimax loadings L
Create target: P = |L|^κ × sign(L), where κ > 1 sharpens loadings
Find oblique rotation closest to P

The kappa (κ) parameter:

κ = 2: Mild sharpening, moderate factor correlations
κ = 4: Strong sharpening, potentially high factor correlations (default in many programs)

Properties:

Computationally fast (no iteration after varimax)
Popular in applied research
Can produce too-high factor correlations if κ is large

Comparison of Oblique Rotation Methods
Method	Control Parameter	Factor Correlations	Computational
Direct Oblimin (δ=0)	Delta (δ)	Tends to be small	Iterative
Oblimin (δ=-0.5)	Delta (δ)	Moderate correlation allowed	Iterative
Promax (κ=4)	Kappa (κ)	Can become large	Non-iterative
Geomin	Epsilon (ε)	Typically moderate	Iterative

When to Use Oblique Rotation

Use oblique when:

Theory suggests factors should correlate (most psychological constructs)
Orthogonal rotation produces poor simple structure
You plan to model higher-order factors (which require correlated first-order factors)
Cross-loadings persist with orthogonal rotation

Stick with orthogonal when:

Factors are conceptually independent
Simplicity of interpretation is paramount
Sample size is small (oblique is less stable)
You need uncorrelated factor scores for downstream analysis

The Oblique Default

Interpreting Oblique Solutions

Oblique solutions require more careful interpretation than orthogonal solutions. Here's a systematic approach.

Step 1: Examine the Factor Correlation Matrix (Φ)

Before interpreting loadings, examine how factors relate:

$$\boldsymbol{\Phi} = \begin{pmatrix} 1.00 & 0.35 & 0.28 \ 0.35 & 1.00 & 0.42 \ 0.28 & 0.42 & 1.00 \end{pmatrix}$$

Interpretation:

Diagonal is always 1 (standardized factors)
Off-diagonal shows factor intercorrelations
High correlations (> 0.7) suggest factors might be one construct
Moderate correlations (0.3-0.5) are typical for related constructs

Step 2: Interpret the Pattern Matrix

The pattern matrix shows unique contributions:

Variable	Factor 1	Factor 2	Factor 3
Var1	0.82	0.05	-0.03
Var2	0.75	-0.08	0.10
Var3	0.02	0.79	0.04
Var4	0.06	0.71	0.08
Var5	-0.05	0.03	0.85
Var6	0.08	0.09	0.73

This shows a clean simple structure in the pattern matrix, despite factor correlations.

Step 3: Compare Pattern and Structure Matrices

Structure matrix entries will be larger due to indirect effects:

Variable	Pattern (F1)	Structure (F1)
Var1	0.82	0.85
Var3	0.02	0.30

Var3's structure loading on F1 (0.30) exceeds its pattern loading (0.02) because F1 correlates with F2, and Var3 loads on F2.

Guidelines for Oblique Interpretation

•Assign variables based on pattern matrix: A variable 'belongs to' the factor with its highest pattern loading
•Note salient pattern loadings: Use pattern matrix to identify factor markers (usually > 0.4)
•Use structure for understanding correlations: The structure matrix shows total variable-factor association
•Report factor correlations: Always report Φ so readers understand factor relationships
•Consider higher-order factors: If factor correlations are high (> 0.5), a higher-order analysis may be warranted

High Factor Correlations: A Warning Sign?

Factor correlations > 0.85 suggest the factors may not be distinct. Possible explanations: • Too many factors extracted • Factors are facets of the same construct • Sample-specific artifact

If factor correlations are very high, consider extracting fewer factors or re-examining your conceptual model.

Reporting Oblique Results

A complete report of an oblique solution includes:

Estimation method and rotation: "Factors were extracted using ML and rotated with oblimin (δ = 0)"
Pattern matrix with salient loadings highlighted
Factor correlation matrix
Communalities
Variance explained (note: doesn't sum to total due to correlated factors)
Interpretation of each factor based on pattern loadings

Table format suggestion:

Variable	F1	F2	F3	h²
Item 1	.78	.05	.02	.65
...	...	...	...	...

Factor Correlations: F1-F2: .35; F1-F3: .28; F2-F3: .42

Practical Guide to Choosing Rotation

With multiple rotation options available, how do you choose? Here's a practical decision framework.

Decision Tree

Question 1: Are factors expected to be correlated based on theory?

Yes → Use oblique rotation (oblimin or promax)
No → Use orthogonal rotation (varimax)
Unsure → Use oblique and examine Φ; if correlations are small, orthogonal gives similar results

Question 2: Is a general factor expected?

Yes → Consider quartimax (or bi-factor models, beyond this scope)
No → Varimax or oblimin

Question 3: What is your software's default?

Many programs default to varimax; explicitly request your chosen rotation
Understand what δ or κ your software uses for oblimin/promax

Use Varimax When:

•Factors are theoretically independent
•You need uncorrelated factor scores
•Simplicity of interpretation is priority
•You're replicating prior orthogonal work
•Sample is small (<200)

Use Oblimin When:

•Factors might correlate (most psychological domains)
•Varimax produces poor simple structure
•You want to test whether factors correlate
•You plan higher-order factor analysis
•Sample is adequate (>250)

Comparing Multiple Rotations

Best practice: Try both orthogonal (varimax) and oblique (oblimin) rotations. Compare:

Loading patterns (should be similar if structure is clear)
Interpretability
Factor correlations in oblique (if small, orthogonal is fine)

If different rotations give substantially different stories, the factor structure may be unclear, requiring:

More data
Revised variable selection
Different number of factors

Software Considerations

Software	Default Rotation	Oblique Available	Notes
SPSS	Varimax	Direct Oblimin, Promax	Reports pattern and structure
SAS (PROC FACTOR)	Varimax	Promax, HK, others	Extensive options
R (psych)	Varimax	Oblimin, Promax, others	Very flexible
Python (sklearn)	None	Not built-in	Use external rotation
Mplus	Geomin	Multiple	CFA-focused but EFA available

Common Pitfalls

Mistakes to Avoid

•Using software defaults without thought: Understand what rotation is being applied
•Interpreting unrotated solutions: Almost never interpretable; always rotate
•Confusing pattern and structure in oblique: Know which you're looking at
•Ignoring factor correlations: Report and interpret Φ
•Forcing orthogonal when factors correlate: Distorts the structure
•Over-rotating: Excessive promax κ can create artificial correlations

Advanced Rotation Topics

Target Rotation (Procrustes)

When you have a hypothesized or prior loading pattern, target rotation rotates toward a specified target matrix:

$$\text{Minimize } ||\mathbf{\Lambda T} - \mathbf{P}||^2$$

where P is your target.

Uses:

Comparing factor structures across groups
Replicating a prior solution in new data
Confirmatory element in exploratory analysis

Partially specified targets: Specify only some target loadings (e.g., zeros for cross-loadings) and let others be free.

Geomin Rotation

Popular in structural equation modeling software (Mplus), geomin minimizes a complexity function:

$$\text{Geomin: Minimize } \sum_i \left( \prod_j (\lambda_{ij}^2 + \epsilon) \right)^{1/k}$$

Geomin tends to produce:

Small, nonzero loadings rather than exact zeros
More robust to local minima
Used as default in Mplus EFA

Bifactor Rotations

When a general factor plus group factors are hypothesized:

Schmid-Leiman transformation: Transforms a higher-order solution into a bifactor form
Bifactor rotation criteria: Directly rotate to bifactor structure

Bifactor models are increasingly popular in psychometrics for understanding general vs. specific variance.

Rotation and Confirmatory Factor Analysis

In Confirmatory Factor Analysis (CFA), the loading pattern is specified a priori:

No rotation is needed; the model is identified by constraints
Cross-loadings are fixed to zero by specification
Model fit tests whether the specified structure holds

EFA with rotation is exploratory; CFA is confirmatory. But target rotation in EFA can bridge the two approaches.

The EFA-CFA Pipeline

A common workflow:

Conduct EFA with rotation on Sample A
Interpret the rotated solution to form hypotheses
Specify a CFA model based on EFA results
Test the CFA model on Sample B (cross-validation)

This uses EFA's flexibility for discovery and CFA's rigor for confirmation.

Local Minima in Rotation

Iterative rotation algorithms (oblimin, varimax) can converge to local minima, especially with:

Many factors (k > 4)
Complex data with no clear structure
Small samples

Solutions:

Run rotation from multiple starting values
Compare solutions across different methods
Use global optimization methods (simulated annealing, genetic algorithms—available in some software)

Non-Converging Rotations

Rarely, rotation fails to converge. Causes include:

Near-singular matrices
Very small sample relative to parameters
Ill-defined factor structure

Try: different rotation method, different number of factors, or examine data for anomalies.

Summary: Factor Rotation

Factor rotation is a critical step in making factor analysis results interpretable. Let's consolidate our understanding:

Key Takeaways

•Rotation exploits indeterminacy: Infinitely many equivalent solutions exist; rotation chooses an interpretable one.
•Simple structure is the goal: Variables should load highly on one factor and near-zero on others.
•Varimax is the orthogonal default: Maximizes variance of squared loadings within columns; produces distinct factors.
•Oblique rotations allow factor correlation: Oblimin and promax are common; pattern matrix shows unique effects, structure shows total correlations.
•Pattern ≠ Structure in oblique solutions: Understand and report both, plus the factor correlation matrix.
•Modern recommendation: Use oblique as default; if Φ shows near-zero correlations, orthogonal is fine.
•Rotation choice affects interpretation: Always report which rotation you used and why.
•Multiple rotations can be compared: If different rotations give very different patterns, the structure may be unclear.

What's Next

Rotation Mastery Achieved

3 / 5