Loading learning content...
You've extracted factors from your data using maximum likelihood or another estimation method. The solution converges, the fit statistics look acceptable, and you have a loading matrix. But when you examine the loadings, every variable seems to load moderately on every factor—there's no clear pattern to interpret.
This is precisely where factor rotation becomes essential. Rotation is the process of transforming an initial factor solution into one that is more interpretable while preserving the statistical properties of the solution. It exploits the rotational indeterminacy of factor models: infinitely many loading matrices produce identical fits to the data, so we're free to choose one that makes substantive sense.
Rotation is not a statistical nicety—it is fundamental to factor analysis practice. Unrotated solutions are almost never directly interpretable. The art of factor analysis lies substantially in choosing and applying rotations that reveal the underlying structure most clearly.
This page covers the theory and practice of factor rotation: what it means mathematically, why we need it, and how to choose among the many rotation methods available.
By the end of this page, you will understand: • Why rotation is necessary and what it mathematically accomplishes • The concept of "simple structure" that guides rotation criteria • Orthogonal rotations: varimax, quartimax, equamax, and their properties • Oblique rotations: promax, oblimin, and when correlated factors are appropriate • How to interpret factor correlation matrices in oblique solutions • Practical guidance for choosing rotation methods • Common pitfalls and best practices in rotation
Maximum likelihood (or other) estimation produces a unique factor solution—but this uniqueness comes from mathematical convenience, not interpretability. The standard ML solution constrains Λ'Ψ⁻¹Λ to be diagonal, which yields a unique solution but produces loadings that:
This happens because the estimation algorithm optimizes statistical criteria, not psychological or substantive clarity.
Recall that if T is any orthogonal matrix (T'T = I), then:
$$\mathbf{\Lambda}^* = \mathbf{\Lambda T}$$
produces identical fit:
$$\mathbf{\Lambda}^* (\mathbf{\Lambda}^*)' = \mathbf{\Lambda T T'\Lambda}' = \mathbf{\Lambda\Lambda}'$$
This means the covariance structure—the only thing observable from data—is preserved under rotation. The rotation matrix T has k(k-1)/2 free parameters (rotation angles), all of which can be set arbitrarily without changing model fit.
Rotation exploits this freedom to find a solution that is interpretable.
Preserved by rotation: • Model fit (χ², RMSEA, etc.) • Reproduced covariance matrix ΛΛ' • Total variance explained (communalities unchanged) • Factor scores' relationships with observed variables
Changed by rotation: • Individual loading values (pattern matrix) • Factor interpretations • Apparent "importance" of factors • Factor correlations (if oblique rotation)
Consider a 2-factor solution plotted in 2D, where each variable is a point based on its loadings on Factor 1 (x-axis) and Factor 2 (y-axis):
Before rotation: Variables cluster in the space but not along the axes. Both factors correlate with all variables to varying degrees.
After rotation: Axes are repositioned so that variable clusters fall along the axes. Each factor now clearly corresponds to a distinct group of variables.
The data points (variables in factor space) don't move—only the axes rotate. But by aligning axes with clusters, factor interpretation becomes natural:
This is the essence of simple structure.
The concept of simple structure was formalized by L.L. Thurstone in the 1930s and 1940s. It provides the theoretical justification for rotation and the criteria by which rotations are evaluated.
Thurstone proposed that an ideal factor solution should exhibit:
Each row should have at least one near-zero loading: Every variable should be unrelated to at least one factor
Each column should have at least k-1 near-zero loadings: For k factors, each factor should have some variables with near-zero loadings (others don't load on it)
For every pair of columns, several variables should have near-zero loadings in both: Ensures factors are distinct
For orthogonal factors: several variables have near-zero loadings in one column and substantial loadings in another: Clear differentiation
For every pair of columns, few variables should have substantial loadings in both: Minimizes cross-loadings
These principles translate to a cleaner goal: each variable should load highly on exactly one factor and have near-zero loadings on all others.
A perfect simple structure loading matrix would look like:
$$\mathbf{\Lambda} = \begin{pmatrix} \bullet & 0 & 0 \ \bullet & 0 & 0 \ 0 & \bullet & 0 \ 0 & \bullet & 0 \ 0 & 0 & \bullet \ 0 & 0 & \bullet \end{pmatrix}$$
where • represents a substantial loading and 0 represents near-zero.
This is called a cluster structure: variables cluster into groups, each defining one factor.
Simple structure isn't just aesthetic—it has scientific value: • Replicability: Simple structures replicate more consistently across samples • Interpretation: Factors can be named based on loading clusters • Parsimony: The simplest explanation consistent with data • External validity: Factors with simple structure tend to correlate more consistently with external criteria
Rotation algorithms optimize criteria that quantify how "simple" a loading matrix is. The general approach:
Simplicity of a loading matrix = Function that is maximized when:
Different rotation methods define this differently, but all attempt to achieve simple structure.
Not all data have simple structure. In some domains:
When simple structure doesn't fit, forcing it through rotation can distort reality. Recognition of genuine complexity is important.
| Characteristic | Simple Structure | Complex Structure |
|---|---|---|
| Cross-loadings | Few and small | Many and substantial |
| Variable clustering | Clear groups | Overlapping memberships |
| Factor interpretation | Straightforward | Nuanced, multi-faceted |
| Replication | Stable across samples | May vary |
| Example domain | Primary mental abilities | Personality structure |
| Rotation benefit | High—reveals structure | Moderate—may distort |
Orthogonal rotations preserve the uncorrelatedness of factors. The rotation matrix T satisfies T'T = I, meaning factors remain at 90° angles after rotation. This maintains interpretive simplicity: factors represent independent dimensions.
Varimax is the most widely used rotation method, introduced by Kaiser (1958). It maximizes the variance of squared loadings within each factor (column).
Objective: $$\text{Varimax: Maximize } V = \sum_{j=1}^{k} \left[ \frac{1}{p} \sum_{i=1}^{p} \tilde{\lambda}{ij}^4 - \left( \frac{1}{p} \sum{i=1}^{p} \tilde{\lambda}_{ij}^2 \right)^2 \right]$$
where $\tilde{\lambda}{ij} = \lambda{ij} / h_i$ are communality-normalized loadings.
Effect:
Properties:
Before varimax rotation, loadings are typically normalized by dividing by the square root of communality: λ̃ᵢⱼ = λᵢⱼ / hᵢ. This gives equal weight to variables regardless of communality. After rotation, the normalization is reversed. This is called Kaiser normalization and is standard practice.
Quartimax maximizes the variance of squared loadings across all loadings (not column-wise). It simplifies rows rather than columns.
Objective: $$\text{Quartimax: Maximize } Q = \sum_{i=1}^{p} \sum_{j=1}^{k} \lambda_{ij}^4$$
Effect:
Properties:
Equamax is a compromise between varimax and quartimax, weighting both column and row simplicity.
Objective: A weighted combination of varimax and quartimax criteria: $$\text{Equamax: } k V + Q$$
where V is the varimax criterion and Q is the quartimax criterion.
Effect:
Properties:
| Method | Simplifies | Best When | Typical Result |
|---|---|---|---|
| Varimax | Columns (factors) | Distinct, independent factors expected | Even variance across factors |
| Quartimax | Rows (variables) | General factor + group factors | One large factor, others smaller |
| Equamax | Both (balanced) | Compromise needed | Between varimax and quartimax |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111
import numpy as npfrom scipy import linalg def varimax_rotation(loadings, gamma=1.0, max_iter=500, tol=1e-6): """ Apply varimax (gamma=1) or quartimax (gamma=0) rotation. gamma=1: Varimax (column-wise simplicity) gamma=0: Quartimax (row-wise simplicity) Parameters ---------- loadings : ndarray of shape (p, k) Initial loading matrix gamma : float Weight between varimax (1) and quartimax (0) Returns ------- rotated_loadings : ndarray of shape (p, k) Rotated loading matrix rotation_matrix : ndarray of shape (k, k) The orthogonal rotation matrix T """ p, k = loadings.shape # Kaiser normalization: normalize rows by communality h_sq = np.sum(loadings**2, axis=1) h = np.sqrt(h_sq) h[h == 0] = 1 # Avoid division by zero normalized = loadings / h[:, np.newaxis] # Initialize rotation matrix as identity T = np.eye(k) for iteration in range(max_iter): # Current rotated loadings L = normalized @ T # Compute rotation update (simplified varimax algorithm) # Based on sum of squared loadings variance L_sq = L ** 2 # For each pair of factors, find optimal rotation angle converged = True for i in range(k-1): for j in range(i+1, k): # Extract 2-factor subproblem x, y = L[:, i], L[:, j] # Compute rotation angle that maximizes varimax criterion # Using the standard formula u = x**2 - y**2 v = 2 * x * y A = np.sum(u) B = np.sum(v) C = np.sum(u**2 - v**2) D = 2 * np.sum(u * v) # Optimal angle num = D - 2 * gamma * A * B / p denom = C - gamma * (A**2 - B**2) / p phi = 0.25 * np.arctan2(num, denom) if np.abs(phi) > tol: converged = False # Apply rotation to columns i, j c, s = np.cos(phi), np.sin(phi) T_ij = T[:, [i, j]] @ np.array([[c, -s], [s, c]]) T[:, i] = T_ij[:, 0] T[:, j] = T_ij[:, 1] L[:, i] = c * L[:, i] + s * L[:, j] L[:, j] = -s * (L[:, i] - s * L[:, j]) / c + c * L[:, j] if converged: break # De-normalize rotated = (normalized @ T) * h[:, np.newaxis] return rotated, T # Example usagenp.random.seed(42) # Simulate unrotated loadings (complex pattern)unrotated = np.array([ [0.7, 0.4], [0.6, 0.5], [0.65, 0.45], [0.4, 0.7], [0.5, 0.6], [0.45, 0.65],]) print("Unrotated Loadings:")print(np.round(unrotated, 3)) rotated_varimax, T_varimax = varimax_rotation(unrotated, gamma=1.0)print("Varimax Rotated Loadings:")print(np.round(rotated_varimax, 3)) rotated_quartimax, T_quartimax = varimax_rotation(unrotated, gamma=0.0)print("Quartimax Rotated Loadings:")print(np.round(rotated_quartimax, 3)) print("Varimax spreads variance; quartimax produces a general factor.")Orthogonal rotations constrain factors to be uncorrelated. But in many domains, underlying factors are expected to correlate:
Oblique rotations allow factors to correlate, providing greater flexibility to achieve simple structure at the cost of interpretive complexity.
With oblique rotation, we must distinguish two matrices:
Pattern Matrix (P): Partial regression weights—each loading represents the unique relationship between a variable and a factor, controlling for other factors.
Structure Matrix (S): Correlations between variables and factors—includes both direct and indirect effects (through factor correlations).
The relationship: $$\mathbf{S} = \mathbf{P} \boldsymbol{\Phi}$$
where Φ is the factor correlation matrix.
For orthogonal rotation: P = S (no distinction needed). For oblique rotation: P ≠ S; both are interpretively relevant.
This is a common source of confusion: • Pattern matrix shows unique effects—use this for determining which variables "belong to" which factor • Structure matrix shows total correlations—use this for understanding variable-factor relationships • When factors correlate highly, pattern and structure can differ substantially • Always examine the factor correlation matrix alongside pattern loadings
Direct Oblimin is the most common oblique rotation. It minimizes cross-products of loadings, pushing them toward simple structure while allowing factor correlation.
Objective: Minimize a function that penalizes cross-loadings: $$\text{Oblimin: Minimize } \sum_{j eq k} \sum_{i} \tilde{\lambda}{ij}^2 \tilde{\lambda}{ik}^2 - \frac{\delta}{p} (\sum_i \tilde{\lambda}{ij}^2)(\sum_i \tilde{\lambda}{ik}^2)$$
The delta (δ) parameter controls obliquity:
Properties:
Promax is a two-step procedure:
Procedure:
The kappa (κ) parameter:
Properties:
| Method | Control Parameter | Factor Correlations | Computational |
|---|---|---|---|
| Direct Oblimin (δ=0) | Delta (δ) | Tends to be small | Iterative |
| Oblimin (δ=-0.5) | Delta (δ) | Moderate correlation allowed | Iterative |
| Promax (κ=4) | Kappa (κ) | Can become large | Non-iterative |
| Geomin | Epsilon (ε) | Typically moderate | Iterative |
Use oblique when:
Stick with orthogonal when:
Practical advice: Many methodologists now recommend starting with oblique rotation as the default. If factors are truly uncorrelated, oblique rotation will show near-zero correlations. If they're correlated, oblique rotation reveals this; orthogonal rotation hides it.
Modern practice: Use oblique rotation (e.g., oblimin with δ=0) as your default. If the resulting factor correlation matrix shows |r| < 0.10 for all pairs, the orthogonal assumption is supported and you can report varimax for simplicity. If factor correlations are substantial, report the oblique solution.
Oblique solutions require more careful interpretation than orthogonal solutions. Here's a systematic approach.
Before interpreting loadings, examine how factors relate:
$$\boldsymbol{\Phi} = \begin{pmatrix} 1.00 & 0.35 & 0.28 \ 0.35 & 1.00 & 0.42 \ 0.28 & 0.42 & 1.00 \end{pmatrix}$$
Interpretation:
The pattern matrix shows unique contributions:
| Variable | Factor 1 | Factor 2 | Factor 3 |
|---|---|---|---|
| Var1 | 0.82 | 0.05 | -0.03 |
| Var2 | 0.75 | -0.08 | 0.10 |
| Var3 | 0.02 | 0.79 | 0.04 |
| Var4 | 0.06 | 0.71 | 0.08 |
| Var5 | -0.05 | 0.03 | 0.85 |
| Var6 | 0.08 | 0.09 | 0.73 |
This shows a clean simple structure in the pattern matrix, despite factor correlations.
Structure matrix entries will be larger due to indirect effects:
| Variable | Pattern (F1) | Structure (F1) |
|---|---|---|
| Var1 | 0.82 | 0.85 |
| Var3 | 0.02 | 0.30 |
Var3's structure loading on F1 (0.30) exceeds its pattern loading (0.02) because F1 correlates with F2, and Var3 loads on F2.
Factor correlations > 0.85 suggest the factors may not be distinct. Possible explanations: • Too many factors extracted • Factors are facets of the same construct • Sample-specific artifact
If factor correlations are very high, consider extracting fewer factors or re-examining your conceptual model.
A complete report of an oblique solution includes:
Table format suggestion:
| Variable | F1 | F2 | F3 | h² |
|---|---|---|---|---|
| Item 1 | .78 | .05 | .02 | .65 |
| ... | ... | ... | ... | ... |
Factor Correlations: F1-F2: .35; F1-F3: .28; F2-F3: .42
With multiple rotation options available, how do you choose? Here's a practical decision framework.
Question 1: Are factors expected to be correlated based on theory?
Question 2: Is a general factor expected?
Question 3: What is your software's default?
Best practice: Try both orthogonal (varimax) and oblique (oblimin) rotations. Compare:
If different rotations give substantially different stories, the factor structure may be unclear, requiring:
| Software | Default Rotation | Oblique Available | Notes |
|---|---|---|---|
| SPSS | Varimax | Direct Oblimin, Promax | Reports pattern and structure |
| SAS (PROC FACTOR) | Varimax | Promax, HK, others | Extensive options |
| R (psych) | Varimax | Oblimin, Promax, others | Very flexible |
| Python (sklearn) | None | Not built-in | Use external rotation |
| Mplus | Geomin | Multiple | CFA-focused but EFA available |
When you have a hypothesized or prior loading pattern, target rotation rotates toward a specified target matrix:
$$\text{Minimize } ||\mathbf{\Lambda T} - \mathbf{P}||^2$$
where P is your target.
Uses:
Partially specified targets: Specify only some target loadings (e.g., zeros for cross-loadings) and let others be free.
Popular in structural equation modeling software (Mplus), geomin minimizes a complexity function:
$$\text{Geomin: Minimize } \sum_i \left( \prod_j (\lambda_{ij}^2 + \epsilon) \right)^{1/k}$$
Geomin tends to produce:
When a general factor plus group factors are hypothesized:
Bifactor models are increasingly popular in psychometrics for understanding general vs. specific variance.
In Confirmatory Factor Analysis (CFA), the loading pattern is specified a priori:
EFA with rotation is exploratory; CFA is confirmatory. But target rotation in EFA can bridge the two approaches.
A common workflow:
This uses EFA's flexibility for discovery and CFA's rigor for confirmation.
Iterative rotation algorithms (oblimin, varimax) can converge to local minima, especially with:
Solutions:
Rarely, rotation fails to converge. Causes include:
Try: different rotation method, different number of factors, or examine data for anomalies.
Factor rotation is a critical step in making factor analysis results interpretable. Let's consolidate our understanding:
With our understanding of factor rotation complete, the next page covers Maximum Likelihood Estimation for factor analysis—how parameters are estimated from data, the log-likelihood function, fit assessment, and standard errors for loadings.
You now understand both the theory and practice of factor rotation. Remember: rotation is not a statistical afterthought—it's fundamental to making factor analysis results meaningful. Choose rotation methods deliberately and report them transparently.