Key points
Best Practices
As the figure below illustrates, this chart type, also known simply as the box plot, is a graphic representation of what statisticians commonly refer to as the five-number summary:
- The maximum and minimum data points (excluding outliers) are represented by the lines above and below the box (the two "whiskers").
- The line in the middle of the box represents the median.
- The upper boundary of the box represents the upper quartile (Q3). This is the median of the data points in the top half of the set.
- The lower boundary of the box represents the lower quartile (Q1). This is the median of the data points in the bottom half of the set.
In the example at the top of the page, the values in Group B are generally lower than those in Groups A and C. That's the first thing most readers will notice. But the chart tells us a good bit more. It's clear that within Groups A and C the data are skewed towards the higher range. It also tells us that the data for Group B are more tightly distributed than the data for Groups A and C.
The box plot can also show actual outlier and non-outlier values, as illustrated in the examples at the bottom of the page.
Consider these best practices when deciding when and how to use a box plot.
- Your audience may not be familiar with the conventions of the box plot. Consider including a brief explanation or a link to a resource. (The chart designer offers several options for supplemental text such as this. For example, you can place it as "subtext" between the chart and the data table or as footnotes below the data table.)
- Make certain the box plot aligns with the public health messages you're trying to convey. Bar charts are appropriate for counts and rates while box plots communicate characteristics of data distribution. How relevant are the skewness and symmetry of the data to your story?
- Consider the importance of outliers to your story and any applicable standards for outlier definition. Unlike bar charts, box plots can show outliers. (Even if you choose not to include them in the visualization, they are automatically included in the supporting data table.) The WCMS uses the standard "IQR 1.5" rule to calculate outliers, but this rule may not be appropriate for your purpose. The definition of an outlier may vary from one scenario to the next. It may be case-dependent or based on industry- or sector-specific guidelines.
- Be careful with colors, especially if you choose to plot non-outliers in the visualization. Generally, lighter box colors work best.
- Your audience may not be familiar with the conventions of the box plot. Consider including a brief explanation or a link to a resource. (The chart designer offers several options for supplemental text such as this. For example, you can place it as "subtext" between the chart and the data table or as footnotes below the data table.)
Quick Build Notes
- Select "Box Plot" as the Visualization Type.
- Upload your data.
- Type in the title and other text fields in the General tab.
- In the Data Series panel, use the Add Data Series dropdown to select a column from your data that represents the values to display.
- In the Date/Category Axis panel, use the Data Key dropdown to select a column from your data that represents the date or category information for the chart.
- Select the Data Scaling Type appropriate to your Data.
- The Visual panel can be used to plot outlier and non-outlier data points.
Configuration Options
The example visualizations below highlight options available for box plots. Key configuration selections are in the build notes section under each example.
For in-depth configuration information visit the Configuration Options section.
Example Box Plot with Outliers Plotted
The WCMS uses the "IQR 1.5" rule for calculating outliers. "IQR" stands for interquartile range, which is Q3 – Q1 (i.e., Q3 minus Q1). Any data points above Q3 + IQR * 1.5 are plotted as outliers above the top whisker; any data points below Q1 – IQR * 1.5 are plotted as outliers below the bottom whisker.
If your data have outliers, you can decide whether to include them in the visualization. In this example, Groups A and C both have outliers. Note that the data table also provides information on outliers, along with other information such as the standard deviation and value totals. (Currently you cannot exclude any of the additional table rows, but you can relabel them.)
Sample Data: Box Plot with Outliers Plotted
- Vertical
- Multiple Series: No
- Score
- Data Scaling Type: Categorical (Linear Scale)
- Data Key: Group
- Borders: True
- Plot Outliers: Checked
Example Box Plot with Non-Outliers Plotted
You can choose to include non-outliers in the visualization. Each dot between the whiskers represents a non-outlier value. As you can see, adding non-outliers to your visualization can result in clutter, so use this option with discretion. (Note that the same data are used for both chart examples.)
Sample Data: Box Plot with Outliers and Non-outliers Plotted
- Vertical
- Multiple Series: No
- Score
- Data Scaling Type: Categorical (Linear Scale)
- Data Key: Group
- Borders: True
- Plot Outliers: Checked
- Plot Non-Outlier Values: Checked