// LEARN — HISTOGRAM / FREQUENCY DISTRIBUTION

What a histogram is — and why bin width is an editorial decision

What this chart is

A Histogram visualises the frequency distribution of a single continuous variable by dividing its range into equal-width intervals (bins) and drawing a bar for each bin whose height encodes how many observations fall within that interval. Unlike a Bar Chart, the x-axis is a continuous numeric scale — the bars are adjacent, with no gaps, because the underlying data has no categorical gaps either.

The perceptual mechanism is area: each bar's area (width × height) is proportional to the count or relative frequency it represents. Because bin widths are equal, height alone carries the frequency signal. The shape of the envelope — the outline formed across all bars — is what the viewer reads: symmetric, skewed left, skewed right, uniform, bimodal, or multimodal.

Anatomy of the encoding

Why it was chosen here

The message requires knowing where values concentrate across a continuous range — not how categories compare, not how something changed over time. Commute time is continuous: 23.4 minutes and 23.6 minutes are neighbouring values on a scale, not separate categories. The histogram is the correct chart when the variable is continuous and the question is distributional.

The right skew in this data — mean pulled above median, long tail to the right — is the actual finding. A summary statistic (just the mean) would hide this completely. Only the distributional view reveals that most commuters cluster in the 20–40 minute band while a meaningful minority commutes 75+ minutes, distorting the mean upward.

Bin width is an editorial decision

Bin width is the single most consequential design choice in a histogram. Too narrow: every bar contains one or two observations; the chart looks like noise and the shape is unreadable. Too wide: genuine structure (bimodality, gaps, secondary peaks) gets smoothed away. The right bin width reveals the distribution shape without overfitting to sample noise.

This implementation exposes the bin-width slider deliberately — it is a teaching tool. Slide from 2 minutes to 20 minutes and watch the same data tell different stories. The Freedman-Diaconis rule (bin width ≈ 2 × IQR × n^(−1/3)) is a principled starting point, but the analyst must verify the result makes distributional sense for the domain. A 5-minute bin for commute times is defensible; a 5-minute bin for geological strata would be absurd.

What the rejected alternatives break

A Bar Chart requires discrete, named categories. Applying it to continuous data requires first binning the values and assigning labels — at which point you have rebuilt a histogram with gaps inserted for no reason. The gap signals a categorical break that doesn't exist in the data.

A Box Plot summarises the same distribution in five numbers (min, Q1, median, Q3, max). It is more compact and better for comparing distributions across groups. But it hides shape: a symmetric unimodal distribution and a bimodal distribution with the same quartiles produce identical box plots. When the shape itself is the message, the histogram is the honest choice.

// FRAMEWORK REFERENCE

FT Visual Vocabulary — Distribution category. "Use a histogram when you want to show the shape of a continuous variable's distribution — where values are concentrated, where the tails are, and whether the distribution is symmetric or skewed." Abela quadrant: Distribution (single variable, continuous). Tufte principle applied: the axis starts at zero — every bar's height is readable as an absolute count, and no baseline distortion inflates or deflates the visual difference between bins. The density curve overlay (KDE) provides a smoothed envelope that is less sensitive to bin-width choices, serving as a cross-check on the shape the binned bars suggest.