Machine LearningManifold Learning

t-Distributed Stochastic Neighbor Embedding (t-SNE)

LevelAdvanced

Duration90 mins

TopicManifold Learning

4 / 5

Interpretation Guidelines for t-SNE

Reading t-SNE Plots Correctly

t-SNE produces visually stunning plots that can reveal beautiful cluster structure in high-dimensional data. These visualizations are so compelling that they've become ubiquitous in scientific papers, presentations, and data exploration workflows. But this visual power comes with a significant danger: t-SNE plots are extraordinarily easy to misinterpret.

The very properties that make t-SNE effective at revealing local structure—the non-linear mapping, the probability-based objective, the heavy-tailed Student-t distribution—also mean that many visual features of t-SNE plots do not correspond to meaningful properties of the data.

This page will transform how you read t-SNE visualizations. We will systematically examine what can be safely concluded from a t-SNE plot, what cannot, and how to avoid the common interpretive traps that even experienced researchers fall into.

The Core Warning

Many intuitions from reading scatter plots DO NOT apply to t-SNE. Cluster sizes, cluster positions, distances between clusters, and even the presence of "clear" clusters can all be misleading. This page will teach you what you CAN safely interpret.

Learning Objectives

By the end of this page, you will: • Understand what t-SNE is designed to preserve (local neighborhoods) • Know which visual features are meaningful and which are artifacts • Develop a rigorous interpretation framework for t-SNE plots • Learn best practices for presenting and communicating t-SNE results • Understand how to validate t-SNE findings with additional analysis

What t-SNE Actually Preserves

To interpret t-SNE correctly, we must understand precisely what the algorithm optimizes for—and therefore what properties of the data are preserved in the embedding.

t-SNE's Objective: Local Neighborhood Preservation

Recall the t-SNE cost function:

C = KL(P || Q) = Σᵢⱼ pᵢⱼ log(pᵢⱼ / qᵢⱼ)

This objective primarily penalizes cases where:

pᵢⱼ is large but qᵢⱼ is small: Points that are neighbors in high-D must be neighbors in low-D

It does NOT strongly penalize:

pᵢⱼ is small but qᵢⱼ is large: Points that are distant in high-D can be neighbors in low-D

The Preservation Hierarchy:

What t-SNE Preserves (In Order of Reliability)
Property	Preservation Level	Interpretation Safety
Points in same local neighborhood	✓✓✓ High	Safe to interpret
Relative positions within tight clusters	✓✓ Moderate	Generally safe
Existence of distinct groups	✓✓ Moderate	Validate with other methods
Cluster density (point spacing)	✓ Low	Not reliable
Cluster sizes (visual area)	✗ Very Low	Do NOT interpret
Distances between clusters	✗ Very Low	Do NOT interpret
Global arrangement of clusters	✗ Very Low	Do NOT interpret

The Key Insight:

t-SNE is designed to answer the question: "If I pick a point, which other points are its neighbors in high-dimensional space?" The embedding is constructed so that neighborhood relationships are preserved—nothing more, nothing less.

This means:

✓ If two points are close together in the t-SNE plot, they were likely neighbors in high-D
✗ If two points are far apart in the t-SNE plot, they were NOT necessarily far apart in high-D
✗ If a cluster is large, it does NOT mean high-D variance is high
✗ If two clusters are far apart, they are NOT necessarily more different than closer clusters

The Asymmetry Intuition

Think of t-SNE as a one-way filter: "Nearby in embedding → probably nearby in original space." But the reverse is NOT reliable: "Far in embedding → ???" (could be nearby or far in original space). This asymmetry is fundamental to correct interpretation.

Cluster Sizes Are Meaningless

One of the most common and dangerous misinterpretations is assigning meaning to the size (visual area or spread) of clusters in a t-SNE plot. This interpretation is almost never valid.

Why Cluster Sizes Don't Reflect Data Properties:

Heavy-tailed Student-t distribution: The Student-t kernel allows distant points to spread out more than a Gaussian would. This spreading is a property of the embedding algorithm, not the data.
Perplexity normalization: Each point's neighborhood probability is normalized to have the same effective number of neighbors (perplexity). A dense 100-point cluster and a sparse 100-point cluster can end up with the same visual size.
Non-linear stretching: t-SNE preserves local neighborhoods, not distances. The algorithm can arbitrarily stretch or compress different regions to fit the 2D space.

Example: Equal-Size Clusters with Different Spread

Demonstration: Cluster Size Illusion
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import numpy as np
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
 
# Create two clusters with VERY different variance
np.random.seed(42)
 
# Cluster 1: Tight (low variance)
cluster1 = np.random.randn(200, 50) * 0.1  # std = 0.1
 
# Cluster 2: Spread out (high variance)
cluster2 = np.random.randn(200, 50) * 2.0 + 10  # std = 2.0, shifted
 
# Combine
X = np.vstack([cluster1, cluster2])
labels = np.array([0]*200 + [1]*200)
 
# Run t-SNE
tsne = TSNE(n_components=2, perplexity=30, random_state=42)
Y = tsne.fit_transform(X)
 
# Measure visual cluster sizes
size_cluster1 = np.std(Y[labels == 0])
size_cluster2 = np.std(Y[labels == 1])
 
print(f"Original data variance - Cluster 1: 0.1, Cluster 2: 2.0")
print(f"Variance ratio in original space: 20:1")
print(f"")
print(f"t-SNE embedding std - Cluster 1: {size_cluster1:.2f}, Cluster 2: {size_cluster2:.2f}")
print(f"Variance ratio in t-SNE: {size_cluster2/size_cluster1:.1f}:1")
 
# TYPICAL RESULT:
# t-SNE embedding std - Cluster 1: 8.5, Cluster 2: 9.2
# Variance ratio in t-SNE: 1.1:1
#
# The 20:1 variance ratio in original space becomes ~1:1 in t-SNE!
# Cluster sizes tell you NOTHING about original spread.

Never Say This

"Cluster A is more spread out in the t-SNE plot, so it must have higher variance in the original feature space." This is WRONG. Cluster sizes in t-SNE are artifacts of the algorithm, not properties of the data. A tight cluster in original space might appear large in t-SNE, and vice versa.

What Causes Visual Size Differences?

Several algorithm-related factors affect cluster appearance:

Number of points: More points generally → larger visual footprint (but not proportionally)
Perplexity relative to cluster size: Can expand or compress clusters
Local density differences: Points in denser high-D regions may spread more to "make room"
Optimization dynamics: Different clusters may converge differently
Random seed: Same data can produce different-sized clusters across runs

None of these reflect meaningful properties of the original data distribution.

Distances Between Clusters Are Not Meaningful

Perhaps the most pernicious misinterpretation is reading meaning into the distances between clusters in a t-SNE plot. If cluster A is closer to cluster B than to cluster C in the visualization, it is tempting to conclude that A and B are more similar. This conclusion is usually invalid.

Why Inter-Cluster Distances Fail:

KL divergence asymmetry: t-SNE doesn't strongly penalize placing distant points near each other (low pᵢⱼ, high qᵢⱼ is cheap). This means clusters can be positioned arbitrarily relative to each other.
Global structure is sacrificed: By design, t-SNE preserves local neighborhoods at the expense of global relationships. The algorithm has no objective term that tries to maintain inter-cluster distances.
Heavy-tailed distribution effect: The Student-t kernel makes all moderately-large distances "look similar" in terms of probability. There's little gradient signal to correctly position far-apart clusters.
Arbitrary rotation and reflection: t-SNE solutions are invariant to rotation and reflection. The absolute positions of clusters are meaningless.

Demonstration: Distance Unreliability
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import numpy as np
from sklearn.manifold import TSNE
from scipy.spatial.distance import pdist, squareform
 
# Create 4 clusters in a known geometric arrangement
# In original space: A-B close, C-D close, A-C far
np.random.seed(42)
 
cluster_A = np.random.randn(100, 50) * 0.5 + np.array([0]*50)
cluster_B = np.random.randn(100, 50) * 0.5 + np.array([1]*50)   # Close to A
cluster_C = np.random.randn(100, 50) * 0.5 + np.array([10]*50)  # Far from A, B
cluster_D = np.random.randn(100, 50) * 0.5 + np.array([11]*50)  # Close to C
 
X = np.vstack([cluster_A, cluster_B, cluster_C, cluster_D])
 
# Compute true inter-cluster distances (using centroids)
centroids_original = np.array([
    cluster_A.mean(axis=0),
    cluster_B.mean(axis=0),
    cluster_C.mean(axis=0),
    cluster_D.mean(axis=0)
])
true_distances = squareform(pdist(centroids_original))
 
print("True inter-cluster distances:")
print("A-B:", round(true_distances[0, 1], 1))  # Should be ~7 (small)
print("A-C:", round(true_distances[0, 2], 1))  # Should be ~70 (large)
print("B-D:", round(true_distances[1, 3], 1))  # Should be ~70 (large)
 
# Run t-SNE multiple times with different seeds
for seed in [42, 123, 456]:
    tsne = TSNE(n_components=2, perplexity=30, random_state=seed)
    Y = tsne.fit_transform(X)
    
    centroids_tsne = np.array([
        Y[:100].mean(axis=0),
        Y[100:200].mean(axis=0),
        Y[200:300].mean(axis=0),
        Y[300:400].mean(axis=0)
    ])
    tsne_distances = squareform(pdist(centroids_tsne))
    
    print(f"\nt-SNE distances (seed={seed}):")
    print("A-B:", round(tsne_distances[0, 1], 1))
    print("A-C:", round(tsne_distances[0, 2], 1))
    print("B-D:", round(tsne_distances[1, 3], 1))
 
# TYPICAL RESULT:
# The relative distances between clusters change dramatically
# across seeds. The 10:1 ratio of A-C to A-B in original space
# might become 2:1, 5:1, or even inverted in t-SNE.

Never Say This

"In the t-SNE plot, cluster A is positioned between clusters B and C, suggesting A shares properties with both." This interpretation is INVALID. Cluster positions are not meaningful—they're artifacts of the non-convex optimization, not reflections of true relationships in the data.

What CAN You Conclude About Clusters?

Despite these limitations, some qualitative conclusions are reasonable:

✓ Distinct clusters exist: If points consistently form separate groups across multiple runs with different seeds, there's likely genuine cluster structure
✓ Points within a cluster are similar: Points grouped together in t-SNE were likely neighbors in high-D
✗ Cluster A is more similar to Cluster B than Cluster C: Cannot be concluded from t-SNE alone

For claims about inter-cluster relationships, use alternative methods:

Compute actual distances/similarities in original feature space
Use hierarchical clustering with dendrograms
Apply MDS or UMAP with different objectives
Calculate cluster separation metrics on original data

The Effect of Perplexity on Interpretation

Different perplexity values can produce dramatically different visualizations of the same data. Understanding this parameter-dependence is essential for correct interpretation.

The Perplexity-Structure Relationship:

Low perplexity (5-10): May create artificial fragmentation
- Continuous manifolds can appear as many small clusters
- Noise can masquerade as structure
- Individual points may split off as "mini-clusters"
Medium perplexity (30-50): Usually appropriate for general structure
- Balances local and moderately-local neighborhoods
- Reasonable default for most exploration
High perplexity (100+): May merge distinct structures
- True clusters can appear merged
- Fine-grained structure is smoothed away
- Global "shape" may emerge at expense of detail

The Multi-Perplexity Principle

For rigorous interpretation, ALWAYS run t-SNE at multiple perplexity values. Structure that persists across perplexities is more likely genuine. Structure that appears only at specific perplexities may be an artifact of the parameter choice. This is akin to examining data at multiple resolutions.

Perplexity-Dependent Features:

Some visual features are strongly perplexity-dependent:

Visual Features and Perplexity Sensitivity
Visual Feature	Low Perplexity	High Perplexity	Interpretation
Number of clusters	Many, possibly artificial	Few, possibly merged	True count is unclear
Cluster tightness	Tight, fragmented	Diffuse, overlapping	Neither reflects true variance
Outliers	Many isolated points	Outliers absorbed	Neither is definitive
Cluster bridges	Separated	May appear connected	Check if continuous at multiple perplexities
Sub-clusters	Visible	Merged into parent	Run hierarchical analysis to confirm

Multi-Perplexity Analysis Pattern
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
 
def robust_tsne_analysis(X, perplexities=[5, 20, 50, 100], n_runs=3):
    """
    Rigorous t-SNE analysis with multiple perplexities and runs.
    
    Returns embeddings for inspection and comparison.
    """
    results = {}
    
    fig, axes = plt.subplots(len(perplexities), n_runs, 
                              figsize=(4*n_runs, 4*len(perplexities)))
    
    for p_idx, perp in enumerate(perplexities):
        results[perp] = []
        
        for run in range(n_runs):
            tsne = TSNE(
                n_components=2,
                perplexity=perp,
                random_state=run * 100,  # Different seed each run
                n_iter=1000,
                init='pca'
            )
            Y = tsne.fit_transform(X)
            results[perp].append(Y)
            
            ax = axes[p_idx, run] if n_runs > 1 else axes[p_idx]
            ax.scatter(Y[:, 0], Y[:, 1], s=1, alpha=0.5)
            ax.set_title(f'Perp={perp}, Run={run+1}')
            ax.set_xticks([])
            ax.set_yticks([])
    
    plt.tight_layout()
    plt.suptitle('Multi-Perplexity, Multi-Run t-SNE Analysis', 
                 y=1.02, fontsize=14)
    
    return results, fig
 
# Interpretation guide:
# - Consistent structures across perplexities → likely genuine
# - Structures only at low perplexity → may be noise/fragmentation
# - Structures only at high perplexity → may be over-smoothed
# - Consistent across runs → reproducible finding
# - Varies across runs → sensitive to initialization, be cautious

Validation Strategies

Given the interpretive challenges with t-SNE, how can you validate that apparent structures are genuine? The key is to use t-SNE for hypothesis generation, then validate with additional analyses.

Strategy 1: Color by Known Labels

If you have ground truth labels or metadata, color points accordingly:

Validation with Labels
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
 
def visualize_with_labels(X, labels, label_names=None, perplexity=30):
    """
    Visualize t-SNE embedding colored by known labels.
    
    This helps validate if visual clusters correspond to meaningful categories.
    """
    tsne = TSNE(n_components=2, perplexity=perplexity, random_state=42)
    Y = tsne.fit_transform(X)
    
    unique_labels = np.unique(labels)
    colors = plt.cm.tab10(np.linspace(0, 1, len(unique_labels)))
    
    fig, ax = plt.subplots(figsize=(10, 8))
    
    for idx, label in enumerate(unique_labels):
        mask = labels == label
        name = label_names[label] if label_names else str(label)
        ax.scatter(Y[mask, 0], Y[mask, 1], 
                   c=[colors[idx]], label=name, s=5, alpha=0.7)
    
    ax.legend(markerscale=3)
    ax.set_title('t-SNE Colored by Ground Truth Labels')
    ax.set_xticks([])
    ax.set_yticks([])
    
    return Y, fig
 
# INTERPRETATION:
# - Labels cleanly separate → t-SNE structure reflects known categories
# - Labels mixed → clusters may not correspond to your labels
# - Unexpected groupings → opportunity for discovery (but validate!)

Strategy 2: Validate Clusters with Original Features

If t-SNE reveals clusters, validate them in the original feature space:

Original-Space Validation Techniques

•Silhouette analysis: Compute silhouette score on cluster assignments using original feature space distances
•Cluster separation: Calculate between-cluster vs within-cluster variance in original features
•Statistical tests: Compare feature distributions across t-SNE-identified clusters
•Nearest neighbor analysis: Check if t-SNE neighbors are truly original-space neighbors
•Cross-validation: Use t-SNE clusters as labels, train classifier in original space; if accurate, clusters are meaningful

Strategy 3: Multiple Dimensionality Reduction Methods

Compare t-SNE with other methods:

Complementary Dimensionality Reduction Methods
Method	What It Preserves	Use for Validation
PCA	Linear variance structure	Check if clusters visible in linear projection
UMAP	Local + some global structure	Compare cluster structure with different algorithm
MDS	Pairwise distances	Verify distance relationships
Spectral Embedding	Graph connectivity	Confirm neighborhood structure
Hierarchical Clustering	Nested cluster structure	Validate cluster hierarchy

The Validation Mindset

Treat t-SNE as a hypothesis generator, not a conclusion provider. When t-SNE reveals structure, ask: "What would I need to see in the original data to confirm this is real?" Then conduct that analysis. t-SNE findings unsupported by original-space evidence should be reported with appropriate caveats.

Best Practices for Presenting t-SNE Results

When communicating t-SNE results in papers, presentations, or reports, follow these guidelines to avoid misleading your audience.

Essential Information to Include:

Always Report

•Perplexity value: Crucial parameter that affects visualization
•Number of iterations: Confirms adequate convergence
•Initialization method: 'random' or 'pca' affects reproducibility
•Random seed (or that multiple seeds were tested)
•Dataset size: N points in D dimensions
•Any preprocessing: Scaling, feature selection, etc.

Language to Use and Avoid:

Appropriate Language

•'t-SNE reveals local neighborhood structure'
•'Points that cluster together appear to be neighbors in the original space'
•'The visualization suggests distinct groups'
•'We observe separation between labeled categories'
•'This pattern is consistent across multiple perplexity values'

Inappropriate Language

•'Cluster A is twice as large as Cluster B'
•'Cluster A is more similar to B than C (based on position)'
•'The clusters are well-separated' (implies distance meaning)
•'This represents the true structure of the data'
•'Points are organized by [feature]' (unless validated)

Visualization Guidelines:

Visualization Best Practices

•Remove or minimize axis labels: x-y coordinates in t-SNE are arbitrary; showing them invites over-interpretation
•Don't include scale bars: Distances aren't meaningful; scale bars are misleading
•Use consistent point sizes: Unless you're encoding additional information
•Consider transparency: Alpha < 1 helps visualize density in crowded regions
•Show multiple runs: Especially in supplements, show results from different seeds
•Pair with quantitative validation: Accompany t-SNE with metrics computed in original space

The Caption Template

Good figure caption: "t-SNE visualization of [N] data points in [D] dimensions. Perplexity = [X], using [Y] iterations with PCA initialization. Colors indicate [Z]. Similar results were observed across perplexity values [range] and multiple random seeds. Cluster separation was validated by [method] in the original feature space."

Summary: A Rigorous Interpretation Framework

Correct interpretation of t-SNE requires understanding both what the algorithm preserves and what visual features are artifacts. Let's consolidate these guidelines.

Key Takeaways

•t-SNE preserves local neighborhoods: Points nearby in the embedding were likely neighbors in original space—this IS reliable
•Cluster sizes are meaningless: Visual area of clusters does NOT reflect variance or spread in original data
•Inter-cluster distances are unreliable: Positions and separations between clusters do NOT reflect true similarity relationships
•Perplexity affects structure: Run at multiple perplexity values; persistent structures are more likely genuine
•Validate with original data: Use t-SNE for hypothesis generation, validate in original feature space
•Report parameters: Always state perplexity, iterations, initialization, and seed when presenting results

The Interpretation Checklist:

Before drawing conclusions from a t-SNE plot, ask:

Is this structure consistent across multiple runs (different seeds)?
Does this structure persist across different perplexity values?
Can I validate this finding using the original feature space?
Am I making claims about cluster sizes or inter-cluster distances? (If yes, stop!)
Have I used appropriate language that doesn't over-claim?

What's Next:

Armed with interpretation guidelines, we now turn to the common pitfalls that trip up t-SNE practitioners. The next page catalogs frequent mistakes and how to avoid them.

Page Complete

You now have a rigorous framework for interpreting t-SNE visualizations. Remember: t-SNE is for revealing local structure, not measuring distances or comparing cluster properties. With this understanding, you can use t-SNE responsibly and avoid the common interpretive traps.

4 / 5

Loading learning content...

Machine LearningManifold Learning

t-Distributed Stochastic Neighbor Embedding (t-SNE)

LevelAdvanced

Duration90 mins

TopicManifold Learning

4 / 5

Interpretation Guidelines for t-SNE

Reading t-SNE Plots Correctly

The Core Warning

Learning Objectives

What t-SNE Actually Preserves

To interpret t-SNE correctly, we must understand precisely what the algorithm optimizes for—and therefore what properties of the data are preserved in the embedding.

t-SNE's Objective: Local Neighborhood Preservation

Recall the t-SNE cost function:

C = KL(P || Q) = Σᵢⱼ pᵢⱼ log(pᵢⱼ / qᵢⱼ)

This objective primarily penalizes cases where:

pᵢⱼ is large but qᵢⱼ is small: Points that are neighbors in high-D must be neighbors in low-D

It does NOT strongly penalize:

pᵢⱼ is small but qᵢⱼ is large: Points that are distant in high-D can be neighbors in low-D

The Preservation Hierarchy:

What t-SNE Preserves (In Order of Reliability)
Property	Preservation Level	Interpretation Safety
Points in same local neighborhood	✓✓✓ High	Safe to interpret
Relative positions within tight clusters	✓✓ Moderate	Generally safe
Existence of distinct groups	✓✓ Moderate	Validate with other methods
Cluster density (point spacing)	✓ Low	Not reliable
Cluster sizes (visual area)	✗ Very Low	Do NOT interpret
Distances between clusters	✗ Very Low	Do NOT interpret
Global arrangement of clusters	✗ Very Low	Do NOT interpret

The Key Insight:

This means:

✓ If two points are close together in the t-SNE plot, they were likely neighbors in high-D
✗ If two points are far apart in the t-SNE plot, they were NOT necessarily far apart in high-D
✗ If a cluster is large, it does NOT mean high-D variance is high
✗ If two clusters are far apart, they are NOT necessarily more different than closer clusters

The Asymmetry Intuition

Cluster Sizes Are Meaningless

One of the most common and dangerous misinterpretations is assigning meaning to the size (visual area or spread) of clusters in a t-SNE plot. This interpretation is almost never valid.

Why Cluster Sizes Don't Reflect Data Properties:

Heavy-tailed Student-t distribution: The Student-t kernel allows distant points to spread out more than a Gaussian would. This spreading is a property of the embedding algorithm, not the data.
Perplexity normalization: Each point's neighborhood probability is normalized to have the same effective number of neighbors (perplexity). A dense 100-point cluster and a sparse 100-point cluster can end up with the same visual size.
Non-linear stretching: t-SNE preserves local neighborhoods, not distances. The algorithm can arbitrarily stretch or compress different regions to fit the 2D space.

Example: Equal-Size Clusters with Different Spread

Demonstration: Cluster Size Illusion
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import numpy as np
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
 
# Create two clusters with VERY different variance
np.random.seed(42)
 
# Cluster 1: Tight (low variance)
cluster1 = np.random.randn(200, 50) * 0.1  # std = 0.1
 
# Cluster 2: Spread out (high variance)
cluster2 = np.random.randn(200, 50) * 2.0 + 10  # std = 2.0, shifted
 
# Combine
X = np.vstack([cluster1, cluster2])
labels = np.array([0]*200 + [1]*200)
 
# Run t-SNE
tsne = TSNE(n_components=2, perplexity=30, random_state=42)
Y = tsne.fit_transform(X)
 
# Measure visual cluster sizes
size_cluster1 = np.std(Y[labels == 0])
size_cluster2 = np.std(Y[labels == 1])
 
print(f"Original data variance - Cluster 1: 0.1, Cluster 2: 2.0")
print(f"Variance ratio in original space: 20:1")
print(f"")
print(f"t-SNE embedding std - Cluster 1: {size_cluster1:.2f}, Cluster 2: {size_cluster2:.2f}")
print(f"Variance ratio in t-SNE: {size_cluster2/size_cluster1:.1f}:1")
 
# TYPICAL RESULT:
# t-SNE embedding std - Cluster 1: 8.5, Cluster 2: 9.2
# Variance ratio in t-SNE: 1.1:1
#
# The 20:1 variance ratio in original space becomes ~1:1 in t-SNE!
# Cluster sizes tell you NOTHING about original spread.

Never Say This

What Causes Visual Size Differences?

Several algorithm-related factors affect cluster appearance:

Number of points: More points generally → larger visual footprint (but not proportionally)
Perplexity relative to cluster size: Can expand or compress clusters
Local density differences: Points in denser high-D regions may spread more to "make room"
Optimization dynamics: Different clusters may converge differently
Random seed: Same data can produce different-sized clusters across runs

None of these reflect meaningful properties of the original data distribution.

Distances Between Clusters Are Not Meaningful

Why Inter-Cluster Distances Fail:

KL divergence asymmetry: t-SNE doesn't strongly penalize placing distant points near each other (low pᵢⱼ, high qᵢⱼ is cheap). This means clusters can be positioned arbitrarily relative to each other.
Global structure is sacrificed: By design, t-SNE preserves local neighborhoods at the expense of global relationships. The algorithm has no objective term that tries to maintain inter-cluster distances.
Heavy-tailed distribution effect: The Student-t kernel makes all moderately-large distances "look similar" in terms of probability. There's little gradient signal to correctly position far-apart clusters.
Arbitrary rotation and reflection: t-SNE solutions are invariant to rotation and reflection. The absolute positions of clusters are meaningless.

Demonstration: Distance Unreliability
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import numpy as np
from sklearn.manifold import TSNE
from scipy.spatial.distance import pdist, squareform
 
# Create 4 clusters in a known geometric arrangement
# In original space: A-B close, C-D close, A-C far
np.random.seed(42)
 
cluster_A = np.random.randn(100, 50) * 0.5 + np.array([0]*50)
cluster_B = np.random.randn(100, 50) * 0.5 + np.array([1]*50)   # Close to A
cluster_C = np.random.randn(100, 50) * 0.5 + np.array([10]*50)  # Far from A, B
cluster_D = np.random.randn(100, 50) * 0.5 + np.array([11]*50)  # Close to C
 
X = np.vstack([cluster_A, cluster_B, cluster_C, cluster_D])
 
# Compute true inter-cluster distances (using centroids)
centroids_original = np.array([
    cluster_A.mean(axis=0),
    cluster_B.mean(axis=0),
    cluster_C.mean(axis=0),
    cluster_D.mean(axis=0)
])
true_distances = squareform(pdist(centroids_original))
 
print("True inter-cluster distances:")
print("A-B:", round(true_distances[0, 1], 1))  # Should be ~7 (small)
print("A-C:", round(true_distances[0, 2], 1))  # Should be ~70 (large)
print("B-D:", round(true_distances[1, 3], 1))  # Should be ~70 (large)
 
# Run t-SNE multiple times with different seeds
for seed in [42, 123, 456]:
    tsne = TSNE(n_components=2, perplexity=30, random_state=seed)
    Y = tsne.fit_transform(X)
    
    centroids_tsne = np.array([
        Y[:100].mean(axis=0),
        Y[100:200].mean(axis=0),
        Y[200:300].mean(axis=0),
        Y[300:400].mean(axis=0)
    ])
    tsne_distances = squareform(pdist(centroids_tsne))
    
    print(f"\nt-SNE distances (seed={seed}):")
    print("A-B:", round(tsne_distances[0, 1], 1))
    print("A-C:", round(tsne_distances[0, 2], 1))
    print("B-D:", round(tsne_distances[1, 3], 1))
 
# TYPICAL RESULT:
# The relative distances between clusters change dramatically
# across seeds. The 10:1 ratio of A-C to A-B in original space
# might become 2:1, 5:1, or even inverted in t-SNE.

Never Say This

What CAN You Conclude About Clusters?

Despite these limitations, some qualitative conclusions are reasonable:

✓ Distinct clusters exist: If points consistently form separate groups across multiple runs with different seeds, there's likely genuine cluster structure
✓ Points within a cluster are similar: Points grouped together in t-SNE were likely neighbors in high-D
✗ Cluster A is more similar to Cluster B than Cluster C: Cannot be concluded from t-SNE alone

For claims about inter-cluster relationships, use alternative methods:

Compute actual distances/similarities in original feature space
Use hierarchical clustering with dendrograms
Apply MDS or UMAP with different objectives
Calculate cluster separation metrics on original data

The Effect of Perplexity on Interpretation

Different perplexity values can produce dramatically different visualizations of the same data. Understanding this parameter-dependence is essential for correct interpretation.

The Perplexity-Structure Relationship:

Low perplexity (5-10): May create artificial fragmentation
- Continuous manifolds can appear as many small clusters
- Noise can masquerade as structure
- Individual points may split off as "mini-clusters"
Medium perplexity (30-50): Usually appropriate for general structure
- Balances local and moderately-local neighborhoods
- Reasonable default for most exploration
High perplexity (100+): May merge distinct structures
- True clusters can appear merged
- Fine-grained structure is smoothed away
- Global "shape" may emerge at expense of detail

The Multi-Perplexity Principle

Perplexity-Dependent Features:

Some visual features are strongly perplexity-dependent:

Visual Features and Perplexity Sensitivity
Visual Feature	Low Perplexity	High Perplexity	Interpretation
Number of clusters	Many, possibly artificial	Few, possibly merged	True count is unclear
Cluster tightness	Tight, fragmented	Diffuse, overlapping	Neither reflects true variance
Outliers	Many isolated points	Outliers absorbed	Neither is definitive
Cluster bridges	Separated	May appear connected	Check if continuous at multiple perplexities
Sub-clusters	Visible	Merged into parent	Run hierarchical analysis to confirm

Multi-Perplexity Analysis Pattern
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
 
def robust_tsne_analysis(X, perplexities=[5, 20, 50, 100], n_runs=3):
    """
    Rigorous t-SNE analysis with multiple perplexities and runs.
    
    Returns embeddings for inspection and comparison.
    """
    results = {}
    
    fig, axes = plt.subplots(len(perplexities), n_runs, 
                              figsize=(4*n_runs, 4*len(perplexities)))
    
    for p_idx, perp in enumerate(perplexities):
        results[perp] = []
        
        for run in range(n_runs):
            tsne = TSNE(
                n_components=2,
                perplexity=perp,
                random_state=run * 100,  # Different seed each run
                n_iter=1000,
                init='pca'
            )
            Y = tsne.fit_transform(X)
            results[perp].append(Y)
            
            ax = axes[p_idx, run] if n_runs > 1 else axes[p_idx]
            ax.scatter(Y[:, 0], Y[:, 1], s=1, alpha=0.5)
            ax.set_title(f'Perp={perp}, Run={run+1}')
            ax.set_xticks([])
            ax.set_yticks([])
    
    plt.tight_layout()
    plt.suptitle('Multi-Perplexity, Multi-Run t-SNE Analysis', 
                 y=1.02, fontsize=14)
    
    return results, fig
 
# Interpretation guide:
# - Consistent structures across perplexities → likely genuine
# - Structures only at low perplexity → may be noise/fragmentation
# - Structures only at high perplexity → may be over-smoothed
# - Consistent across runs → reproducible finding
# - Varies across runs → sensitive to initialization, be cautious

Validation Strategies

Given the interpretive challenges with t-SNE, how can you validate that apparent structures are genuine? The key is to use t-SNE for hypothesis generation, then validate with additional analyses.

Strategy 1: Color by Known Labels

If you have ground truth labels or metadata, color points accordingly:

Validation with Labels
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
 
def visualize_with_labels(X, labels, label_names=None, perplexity=30):
    """
    Visualize t-SNE embedding colored by known labels.
    
    This helps validate if visual clusters correspond to meaningful categories.
    """
    tsne = TSNE(n_components=2, perplexity=perplexity, random_state=42)
    Y = tsne.fit_transform(X)
    
    unique_labels = np.unique(labels)
    colors = plt.cm.tab10(np.linspace(0, 1, len(unique_labels)))
    
    fig, ax = plt.subplots(figsize=(10, 8))
    
    for idx, label in enumerate(unique_labels):
        mask = labels == label
        name = label_names[label] if label_names else str(label)
        ax.scatter(Y[mask, 0], Y[mask, 1], 
                   c=[colors[idx]], label=name, s=5, alpha=0.7)
    
    ax.legend(markerscale=3)
    ax.set_title('t-SNE Colored by Ground Truth Labels')
    ax.set_xticks([])
    ax.set_yticks([])
    
    return Y, fig
 
# INTERPRETATION:
# - Labels cleanly separate → t-SNE structure reflects known categories
# - Labels mixed → clusters may not correspond to your labels
# - Unexpected groupings → opportunity for discovery (but validate!)

Strategy 2: Validate Clusters with Original Features

If t-SNE reveals clusters, validate them in the original feature space:

Original-Space Validation Techniques

•Silhouette analysis: Compute silhouette score on cluster assignments using original feature space distances
•Cluster separation: Calculate between-cluster vs within-cluster variance in original features
•Statistical tests: Compare feature distributions across t-SNE-identified clusters
•Nearest neighbor analysis: Check if t-SNE neighbors are truly original-space neighbors
•Cross-validation: Use t-SNE clusters as labels, train classifier in original space; if accurate, clusters are meaningful

Strategy 3: Multiple Dimensionality Reduction Methods

Compare t-SNE with other methods:

Complementary Dimensionality Reduction Methods
Method	What It Preserves	Use for Validation
PCA	Linear variance structure	Check if clusters visible in linear projection
UMAP	Local + some global structure	Compare cluster structure with different algorithm
MDS	Pairwise distances	Verify distance relationships
Spectral Embedding	Graph connectivity	Confirm neighborhood structure
Hierarchical Clustering	Nested cluster structure	Validate cluster hierarchy

The Validation Mindset

Best Practices for Presenting t-SNE Results

When communicating t-SNE results in papers, presentations, or reports, follow these guidelines to avoid misleading your audience.

Essential Information to Include:

Always Report

•Perplexity value: Crucial parameter that affects visualization
•Number of iterations: Confirms adequate convergence
•Initialization method: 'random' or 'pca' affects reproducibility
•Random seed (or that multiple seeds were tested)
•Dataset size: N points in D dimensions
•Any preprocessing: Scaling, feature selection, etc.

Language to Use and Avoid:

Appropriate Language

•'t-SNE reveals local neighborhood structure'
•'Points that cluster together appear to be neighbors in the original space'
•'The visualization suggests distinct groups'
•'We observe separation between labeled categories'
•'This pattern is consistent across multiple perplexity values'

Inappropriate Language

•'Cluster A is twice as large as Cluster B'
•'Cluster A is more similar to B than C (based on position)'
•'The clusters are well-separated' (implies distance meaning)
•'This represents the true structure of the data'
•'Points are organized by [feature]' (unless validated)

Visualization Guidelines:

Visualization Best Practices

•Remove or minimize axis labels: x-y coordinates in t-SNE are arbitrary; showing them invites over-interpretation
•Don't include scale bars: Distances aren't meaningful; scale bars are misleading
•Use consistent point sizes: Unless you're encoding additional information
•Consider transparency: Alpha < 1 helps visualize density in crowded regions
•Show multiple runs: Especially in supplements, show results from different seeds
•Pair with quantitative validation: Accompany t-SNE with metrics computed in original space

The Caption Template

Summary: A Rigorous Interpretation Framework

Correct interpretation of t-SNE requires understanding both what the algorithm preserves and what visual features are artifacts. Let's consolidate these guidelines.

Key Takeaways

•t-SNE preserves local neighborhoods: Points nearby in the embedding were likely neighbors in original space—this IS reliable
•Cluster sizes are meaningless: Visual area of clusters does NOT reflect variance or spread in original data
•Inter-cluster distances are unreliable: Positions and separations between clusters do NOT reflect true similarity relationships
•Perplexity affects structure: Run at multiple perplexity values; persistent structures are more likely genuine
•Validate with original data: Use t-SNE for hypothesis generation, validate in original feature space
•Report parameters: Always state perplexity, iterations, initialization, and seed when presenting results

The Interpretation Checklist:

Before drawing conclusions from a t-SNE plot, ask:

Is this structure consistent across multiple runs (different seeds)?
Does this structure persist across different perplexity values?
Can I validate this finding using the original feature space?
Am I making claims about cluster sizes or inter-cluster distances? (If yes, stop!)
Have I used appropriate language that doesn't over-claim?

What's Next:

Armed with interpretation guidelines, we now turn to the common pitfalls that trip up t-SNE practitioners. The next page catalogs frequent mistakes and how to avoid them.

Page Complete

4 / 5