// HAI — GLOBAL DEVELOPMENT INDICATORS · 2024 ESTIMATES

Education Correlates With Longevity —
But Population Scale Varies Dramatically

Education and Longevity Bubble Chart 26 countries plotted by Education Index on the x-axis (0 to 1 scale, HDI component) and Life Expectancy on the y-axis (years). Bubble area encodes population in millions. Color and stroke dash pattern indicate world region.

// About this chart type — Bubble Chart

A bubble chart is a trivariate scatter plot: it places data points on a Cartesian grid where X and Y encode two quantitative variables using position — the most accurate perceptual channel available — and then encodes a third quantitative variable through circle area. A fourth variable (here, world region) is layered in via colour and stroke dash pattern, providing redundant encoding that works in monochrome and for colour-blind viewers. Bubbles are sized so that area is proportional to the data value, not radius — scaling by radius would produce exponential visual distortion and is a common implementation error.

The critical rule: bubble charts are appropriate only when three or more variables need to be shown simultaneously and when the correlation structure between the primary two is part of the message. When the message is a simple ranking, a bar chart is more readable. When the message is a single bivariate relationship, a scatter plot suffices. The bubble chart earns its complexity only when the third variable (here, population) is itself part of the story.

About this example: Each bubble represents one country. The X axis shows the Education Index (0–1 scale, an HDI component measuring mean and expected years of schooling). The Y axis shows life expectancy in years. Bubble area encodes population in millions — China and India visibly dominate. Colour and stroke pattern encode world region. The positive correlation between education and longevity is immediately visible as an upward-right sweep of the point cloud, while the vast size differences between countries add a third layer of meaning. Click any bubble or legend item to isolate a region and examine its sub-pattern.

// LEARN — BUBBLE CHART · FT VISUAL VOCABULARY: CORRELATION ABELA QUADRANT: RELATIONSHIP

What Makes a Bubble Chart Work — And Exactly When It Breaks

What this chart is

A bubble chart is a trivariate scatter plot: it places data points on a Cartesian grid using position (x, y) to encode two quantitative variables, then encodes a third variable through circle area. The perceptual mechanism exploited is position along a common axis — the most accurate channel in Cleveland and McGill's encoding hierarchy — supplemented by area, which is less accurate but allows a third quantitative dimension without adding a spatial axis. Color and stroke pattern layer in a fourth variable: categorical region membership.

Why it was chosen

The data contains three quantitative variables (education, life expectancy, population) and one categorical variable (region) — a structure that cannot be represented by any two-variable chart without discarding information. A scatter plot loses population. A bar chart loses the correlation structure entirely. A bubble chart is the minimum-distortion solution for this data shape. The message — that education and longevity correlate, but that population scale differs enormously across regions — requires simultaneously visible x/y correlation and visible size variation. This chart delivers both.

What the alternative would break

The nearest alternative, a grouped bar chart, could show either education or life expectancy by country but cannot encode population size except through supplementary labeling. More critically, it destroys the correlation structure: the viewer sees regional averages or individual bars — not the relationship between the two variables. Countries where education is high but life expectancy lags (or vice versa) become invisible. A stacked area chart fails for different reasons: stacking requires values that sum to a meaningful total — this data has no such property.

The hard limits

Bubble charts fail at scale: beyond 30–40 bubbles, occlusion from overlapping circles degrades legibility. This implementation addresses overlap with semi-transparent fills (fill-opacity: 0.75) and renders largest bubbles first so smaller ones remain visible above them. The second hard limit — and the most common implementation error — is encoding size by radius rather than area. Encoding by radius causes exponential visual distortion: a circle twice the radius appears four times as large. This chart uses d3.scaleSqrt(), mapping population values to radius such that area scales linearly with the data.

// FRAMEWORK REFERENCE

The FT Visual Vocabulary classifies bubble charts under Correlation: "Show the relationship between two or more variables — be careful that the chart does not imply causation." The bubble chart extends the scatter plot's correlation function into three dimensions. Abela's chart selection framework places it in the Relationship quadrant when the primary question is correlation, or the Comparison quadrant when the question is magnitude across categories — this implementation serves both simultaneously, which is the bubble chart's distinctive capability and its primary interpretive risk. Tufte's principle of maximum data-ink ratio is honored: every visual element — position, area, color, stroke pattern — encodes data. No decorative chrome.

// The one design decision worth knowing

Stroke dash patterns serve as redundant encoding alongside color — a WCAG 2.1 requirement that benefits every user, not only those with color vision deficiencies. Each region uses a distinct stroke-dasharray: Africa solid, Americas 4-2 dash, Asia 1.5-3 dot, Europe 8-2 long-dash, Oceania 3-2 short-dash. A monochrome printout or screenshot retains full categorical legibility. The decision costs zero screen space and zero cognitive load — the patterns are subtle enough not to compete with size and position as the primary encodings, but distinct enough to survive any rendering environment.