// LEARN — STEM & LEAF PLOT / STEMPLOT

What a stemplot encodes — and the losslessness that makes it different from every other distribution chart

What this chart is

A Stem and Leaf Plot organises a dataset by its place values. Each value is split into a stem (the leading digit or digits, typically representing the tens place) and a leaf (the trailing digit, representing the ones place). Stems are arranged in ascending order in a central column; leaves extend horizontally from their corresponding stem.

The defining property is losslessness: every original data value is preserved in full and can be reconstructed exactly by combining its stem and leaf. A score of 74 appears as leaf "4" next to stem "7". No aggregation, no binning, no smoothing — the raw data is the chart. This makes the stemplot the only distribution chart that doubles as a complete data reference.

// Example: dataset (2, 4, 11, 17, 20, 23) 0 | 2 4
1 | 1 7
2 | 0 3
stem = tens digit · leaf = ones digit

Back-to-back stemplots

When two datasets share the same stem values, they can be displayed back-to-back: one dataset's leaves extend to the left, the other's to the right. The stems occupy the shared central column. This allows direct visual comparison of two distributions at every stem level simultaneously — the viewer reads one dataset's shape from right to left and the other from left to right, and the asymmetry between them is immediately visible.

The back-to-back stemplot is the simplest multi-dataset comparison tool that preserves every data value. A side-by-side box plot compresses each dataset to five numbers; back-to-back histograms bin the data; the back-to-back stemplot shows everything.

// Back-to-back: Group A | Stem | Group B 9 4 2 | 6 | 1 5 8
8 5 3 1 | 7 | 2 4 6
7 2 | 8 | 0 3 3 9
Group A leaves read right-to-left

Why it was chosen here

The message compares two class sections' exam score distributions. The dataset is small enough (25–30 values per section) that every data point can be displayed without overcluttering — this is the stemplot's optimal range. Binning the data into a histogram would lose the exact scores; a box plot would reduce each section to five numbers and hide the clustering pattern in the 70s and 80s.

The back-to-back mode is specifically chosen because the two sections share the same stem structure (60s, 70s, 80s, 90s) and a direct row-by-row comparison — "how many students scored in the 70s in Section A versus Section B?" — is the core question. The shared stem column makes this comparison structurally explicit.

Size limits and when not to use it

The stemplot has hard practical limits in both directions. With fewer than 10–15 values, most stem rows contain 0–1 leaves; the "distribution" is too sparse to have a shape, and a simple sorted list is more useful. With more than 50–80 values per dataset, stem rows become crowded with 10+ leaves, the display wraps, and the visual distribution signal is lost in a wall of digits.

Within its range, the stemplot excels for exploratory data analysis, classroom settings, and any context where the analyst needs to see the actual values — not a smoothed representation of them. Outside that range, a histogram (more data) or a dot plot / table (less data) is the appropriate substitute.

// FRAMEWORK REFERENCE

FT Visual Vocabulary — Distribution category (small-N, lossless variant). "Use a stem and leaf plot when the dataset is small enough to display every value and preserving the exact values matters — for reference, for outlier detection, or for teaching." Abela quadrant: Distribution (single or two variables, small N). Tufte principle: maximum data density per unit of display space — each character encodes exactly one data value, with zero data-ink wasted on decoration. The data-ink ratio approaches 1.0: stems encode the structural grouping, leaves encode the actual values, and the horizontal extension of the leaf row encodes the row frequency. Three encoding channels, zero redundancy. The one design decision worth knowing: leaves are sorted ascending by default because sorted leaves make the within-row distribution readable and allow the median to be located by counting. Unsorted leaves (as they would appear in data-entry order) reveal data-collection sequence but obscure distributional shape.

Section B clusters in the 70s and 80s — Section A spreads wider with more extreme values on both ends

What a stemplot encodes — and the losslessness that makes it different from every other distribution chart

What this chart is

Back-to-back stemplots

Why it was chosen here

Size limits and when not to use it