Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can ggplot2 have a Stat that simply summarises data by group? #3501

Open
yutannihilation opened this issue Aug 24, 2019 · 1 comment · May be fixed by #6103
Open

Can ggplot2 have a Stat that simply summarises data by group? #3501

yutannihilation opened this issue Aug 24, 2019 · 1 comment · May be fixed by #6103
Labels
feature a feature request or enhancement layers 📈

Comments

@yutannihilation
Copy link
Member

Every time I encounter a question like #3497, I wonder why ggplot2 doesn't have a Stat that simply applies a function by group. Though, in terms of the computational efficiency, it's generally better to have a summarised version of the data before entering ggplot2, it would be handy if we can summarise in ggplot2 especially when we generate plots one after another with different groupings.

I believe StatSummary could have been implemented to be able to summarise data with other groupings than c("group", "x") because the code following seems very general one:

ggplot2/R/stat-summary.r

Lines 163 to 169 in b842024

summarise_by_x <- function(data, summary, ...) {
summary <- dapply(data, c("group", "x"), summary, ...)
unique <- dapply(data, c("group", "x"), uniquecols)
unique$y <- NULL
merge(summary, unique, by = c("x", "group"), sort = FALSE)
}

But, as the current make_summary_fun() expects a function that takes a vector, not a data.frame, it would be difficult to expand StatSummary to accept a function that summarises both x and y. So, to satisfy the need, I feel it might be nice to have some simple geom like below.

I don't see reasons why we shouldn't implement such a Stat. Am I missing something...?

library(ggplot2)

stat_summary_by_group <- function(mapping = NULL, data = NULL,
                                  geom = "pointrange", position = "identity",
                                  ...,
                                  fun.data = NULL,
                                  na.rm = FALSE,
                                  show.legend = NA,
                                  inherit.aes = TRUE) {
  layer(
    data = data,
    mapping = mapping,
    stat = StatSummaryByGroup,
    geom = geom,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      fun.data = fun.data,
      na.rm = na.rm,
      ...
    )
  )
}

StatSummaryByGroup <- ggproto("StatSummaryByGroup", Stat,
  compute_group = function(data, scales, fun.data = NULL, na.rm = FALSE) {
    summary <- fun.data(data)
    unique <- ggplot2:::dapply(data, c("group"), ggplot2:::uniquecols)
    unique[names(summary)] <- summary
    unique
  }
)

d <- data.frame(x = c(1:5, 3:7), y = 1:10, g = rep(c("a", "b"), each = 5), stringsAsFactors = FALSE)
f <- function(d) {
  data.frame(x = min(d$x), xend = max(d$x), y = mean(d$y), yend = mean(d$y))
}

ggplot(d) +
  geom_point(aes(x, y, colour = g)) +
  stat_summary_by_group(fun.data = f, aes(x, y, xend = stat(xend), yend = stat(yend)), geom = "segment") +
  facet_grid(cols = vars(g))

Created on 2019-08-24 by the reprex package (v0.3.0)

@yutannihilation yutannihilation added the feature a feature request or enhancement label Aug 24, 2019
@thomasp85
Copy link
Member

I agree this makes some sense, and will be a good fallback for situations where the provides stats does t do exactly what the user need. One thing that complicated it all is in terms of documenting what kind of columns should get returned. This is quite dependent on the geom it gets coupled with and will require some knowledge on how ggplot2 works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement layers 📈
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants