Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discarding boxplot outliers #5379

Merged
merged 5 commits into from
Aug 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# ggplot2 (development version)

* `geom_boxplot()` gains an `outliers` argument to switch outliers on or off,
in a manner that does affects the scale range. For hiding outliers that does
not affect the scale range, you can continue to use `outlier.shape = NA`
(@teunbrand, #4892).

* Binned scales now treat `NA`s in limits the same way continuous scales do
(#5355).

Expand All @@ -9,6 +14,7 @@
deprecated. The `hjust` setting of the `legend.text` and `legend.title`
elements continues to fulfil the role of text alignment (@teunbrand, #5347).


* Integers are once again valid input to theme arguments that expect numeric
input (@teunbrand, #5369)

Expand Down
21 changes: 14 additions & 7 deletions R/geom-boxplot.R
Original file line number Diff line number Diff line change
Expand Up @@ -33,19 +33,19 @@
#' @inheritParams geom_bar
#' @param geom,stat Use to override the default connection between
#' `geom_boxplot()` and `stat_boxplot()`.
#' @param outliers Whether to display (`TRUE`) or discard (`FALSE`) outliers
#' from the plot. Hiding or discarding outliers can be useful when, for
#' example, raw data points need to be displayed on top of the boxplot.
#' By discarding outliers, the axis limits will adapt to the box and whiskers
#' only, not the full data range. If outliers need to be hidden and the axes
#' needs to show the full data range, please use `outlier.shape = NA` instead.
Comment on lines +40 to +41
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It felt more appropriate to distinguish discarding vs hiding here than to keep the documentation over at the outlier.shape paragraph.

#' @param outlier.colour,outlier.color,outlier.fill,outlier.shape,outlier.size,outlier.stroke,outlier.alpha
#' Default aesthetics for outliers. Set to `NULL` to inherit from the
#' aesthetics used for the box.
#'
#' In the unlikely event you specify both US and UK spellings of colour, the
#' US spelling will take precedence.
#'
#' Sometimes it can be useful to hide the outliers, for example when overlaying
#' the raw data points on top of the boxplot. Hiding the outliers can be achieved
#' by setting `outlier.shape = NA`. Importantly, this does not remove the outliers,
#' it only hides them, so the range calculated for the y-axis will be the
#' same with outliers shown and outliers hidden.
#'
#' @param notch If `FALSE` (default) make a standard box plot. If
#' `TRUE`, make a notched box plot. Notches are used to compare groups;
#' if the notches of two boxes do not overlap, this suggests that the medians
Expand Down Expand Up @@ -109,6 +109,7 @@
geom_boxplot <- function(mapping = NULL, data = NULL,
stat = "boxplot", position = "dodge2",
...,
outliers = TRUE,
outlier.colour = NULL,
outlier.color = NULL,
outlier.fill = NULL,
Expand All @@ -133,6 +134,7 @@ geom_boxplot <- function(mapping = NULL, data = NULL,
position$preserve <- "single"
}
}
check_bool(outliers)

layer(
data = data,
Expand All @@ -143,6 +145,7 @@ geom_boxplot <- function(mapping = NULL, data = NULL,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list2(
outliers = outliers,
outlier.colour = outlier.color %||% outlier.colour,
outlier.fill = outlier.fill,
outlier.shape = outlier.shape,
Expand All @@ -167,7 +170,7 @@ GeomBoxplot <- ggproto("GeomBoxplot", Geom,

# need to declare `width` here in case this geom is used with a stat that
# doesn't have a `width` parameter (e.g., `stat_identity`).
extra_params = c("na.rm", "width", "orientation"),
extra_params = c("na.rm", "width", "orientation", "outliers"),

setup_params = function(data, params) {
params$flipped_aes <- has_flipped_aes(data, params)
Expand All @@ -180,6 +183,10 @@ GeomBoxplot <- ggproto("GeomBoxplot", Geom,
data$width <- data$width %||%
params$width %||% (resolution(data$x, FALSE) * 0.9)

if (isFALSE(params$outliers)) {
data$outliers <- NULL
}

if (!is.null(data$outliers)) {
suppressWarnings({
out_min <- vapply(data$outliers, min, numeric(1))
Expand Down
16 changes: 9 additions & 7 deletions man/geom_boxplot.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions tests/testthat/test-geom-boxplot.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,15 @@ test_that("geom_boxplot range includes all outliers", {

expect_true(miny <= min(dat$y))
expect_true(maxy >= max(dat$y))

# Unless specifically directed not to
p <- ggplot_build(ggplot(dat, aes(x, y)) + geom_boxplot(outliers = FALSE))

miny <- p$layout$panel_params[[1]]$y.range[1]
maxy <- p$layout$panel_params[[1]]$y.range[2]

expect_lte(maxy, max(dat$y))
expect_gte(miny, min(dat$y))
})

test_that("geom_boxplot works in both directions", {
Expand Down
Loading