refine docs for 3 integrated survival scores (ISBS, ISS, ISSL)

mlr-org · Oct 8, 2024 · 8ad9eb1 · 8ad9eb1
1 parent b4b2b2b
commit 8ad9eb1
Show file tree

Hide file tree

Showing 8 changed files with 291 additions and 101 deletions.
diff --git a/R/MeasureSurvGraf.R b/R/MeasureSurvGraf.R
@@ -20,10 +20,10 @@
 #'
 #' @details
 #' This measure has two dimensions: (test set) observations and time points.
-#' For a specific individual \eqn{i}, with observed survival outcome \eqn{(t_i, \delta_i)}
-#' (time and censoring indicator) and predicted survival function \eqn{S_i(t)}, the
-#' *observation-wise* loss integrated across the time dimension up to the
-#' time cutoff \eqn{\tau^*}, is:
+#' For a specific individual \eqn{i} from the test set, with observed survival
+#' outcome \eqn{(t_i, \delta_i)} (time and censoring indicator) and predicted
+#' survival function \eqn{S_i(t)}, the *observation-wise* loss integrated across
+#' the time dimension up to the time cutoff \eqn{\tau^*}, is:
 #'
 #' \deqn{L_{ISBS}(S_i, t_i, \delta_i) = \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0  \frac{S_i^2(\tau) \text{I}(t_i \leq \tau, \delta=1)}{G(t_i)} + \frac{(1-S_i(\tau))^2 \text{I}(t_i > \tau)}{G(\tau)} \ d\tau}
 #'
@@ -33,14 +33,16 @@
 #'
 #' \deqn{L_{RISBS}(S_i, t_i, \delta_i) = \delta_i \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0  \frac{S_i^2(\tau) \text{I}(t_i \leq \tau) + (1-S_i(\tau))^2 \text{I}(t_i > \tau)}{G(t_i)} \ d\tau}
 #'
-#' which is always weighted by \eqn{G(t_i)} and removes the censored observations.
+#' which is always weighted by \eqn{G(t_i)} and is equal to zero for a censored subject.
 #'
-#' RISBS is strictly proper when the censoring distribution is independent
-#' of the survival distribution and when \eqn{G(t)} is fit on a sufficiently large dataset.
-#' ISBS is never proper. Use `proper = FALSE` for ISBS and `proper = TRUE` for RISBS.
-#' Results may be very different if many observations are
-#' censored at the last observed time due to division by \eqn{1/eps} in `proper = TRUE`.
+#' To get a single score across all \eqn{N} observations of the test set, we
+#' return the average of the time-integrated observation-wise scores:
+#' \deqn{\sum_{i=1}^N L(S_i, t_i, \delta_i) / N}
 #'
+#' @template properness
+#' @templateVar improper_id ISBS
+#' @templateVar proper_id RISBS
+#' @template which_times
 #' @template details_method
 #' @template details_trainG
 #' @template details_tmax

diff --git a/R/MeasureSurvIntLogloss.R b/R/MeasureSurvIntLogloss.R
@@ -17,22 +17,31 @@
 #' Logarithmic (log) Loss, aka integrated cross entropy.
 #'
 #' @details
-#' For an individual who dies at time \eqn{t}, with predicted Survival function, \eqn{S}, the
-#' probabilistic log loss at time \eqn{t^*}{t*} is given by
-#' \deqn{L_{ISLL}(S,t|t^*) = - [log(1 - S(t^*))I(t \le t^*, \delta = 1)(1/G(t))] - [log(S(t^*))I(t > t^*)(1/G(t^*))]}
+#' This measure has two dimensions: (test set) observations and time points.
+#' For a specific individual \eqn{i} from the test set, with observed survival
+#' outcome \eqn{(t_i, \delta_i)} (time and censoring indicator) and predicted
+#' survival function \eqn{S_i(t)}, the *observation-wise* loss integrated across
+#' the time dimension up to the time cutoff \eqn{\tau^*}, is:
+#'
+#' \deqn{L_{ISLL}(S_i, t_i, \delta_i) = -\text{I}(t_i \leq \tau^*) \int^{\tau^*}_0  \frac{log[1-S_i(\tau)] \text{I}(t_i \leq \tau, \delta=1)}{G(t_i)} + \frac{\log[S_i(\tau)] \text{I}(t_i > \tau)}{G(\tau)} \ d\tau}
+#'
 #' where \eqn{G} is the Kaplan-Meier estimate of the censoring distribution.
 #'
-#' The re-weighted ISLL, RISLL is given by
-#' \deqn{L_{RISLL}(S,t|t^*) = - [log(1 - S(t^*))I(t \le t^*, \delta = 1)(1/G(t))] - [log(S(t^*))I(t > t^*)(1/G(t))]}
-#' where \eqn{G} is the Kaplan-Meier estimate of the censoring distribution, i.e. always
-#' weighted by \eqn{G(t)}.
-#' RISLL is strictly proper when the censoring distribution is independent
-#' of the survival distribution and when G is fit on a sufficiently large dataset.
-#' ISLL is never proper.
-#' Use `proper = FALSE` for ISLL and `proper = TRUE` for RISLL.
-#' Results may be very different if many observations are censored at the last
-#' observed time due to division by 1/`eps` in `proper = TRUE`.
+#' The **re-weighted ISLL** (RISLL) is:
+#'
+#' \deqn{L_{RISLL}(S_i, t_i, \delta_i) = -\delta_i \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0  \frac{\log[1-S_i(\tau)]) \text{I}(t_i \leq \tau) + \log[S_i(\tau)] \text{I}(t_i > \tau)}{G(t_i)} \ d\tau}
+#'
+#' which is always weighted by \eqn{G(t_i)} and is equal to zero for a censored subject.
+#'
+#' To get a single score across all \eqn{N} observations of the test set, we
+#' return the average of the time-integrated observation-wise scores:
+#' \deqn{\sum_{i=1}^N L(S_i, t_i, \delta_i) / N}
 #'
+#' @template properness
+#' @templateVar improper_id ISLL
+#' @templateVar proper_id RISLL
+#' @template which_times
+#' @template details_method
 #' @template details_trainG
 #' @template details_tmax
 #'

diff --git a/R/MeasureSurvLogloss.R b/R/MeasureSurvLogloss.R
@@ -16,7 +16,7 @@
 #' The Log Loss, in the context of probabilistic predictions, is defined as the
 #' negative log probability density function, \eqn{f}, evaluated at the
 #' observation time (event or censoring), \eqn{t},
-#' \deqn{L_{NLL}(f, t) = -log(f(t))}
+#' \deqn{L_{NLL}(f, t) = -\log[f(t)]}
 #'
 #' The standard error of the Log Loss, L, is approximated via,
 #' \deqn{se(L) = sd(L)/\sqrt{N}}{se(L) = sd(L)/\sqrt N}

diff --git a/R/MeasureSurvSchmid.R b/R/MeasureSurvSchmid.R
@@ -16,20 +16,38 @@
 #' Calculates the **Integrated Schmid Score** (ISS), aka integrated absolute loss.
 #'
 #' @details
-#' For an individual who dies at time \eqn{t}, with predicted Survival function, \eqn{S}, the
-#' Schmid Score at time \eqn{t^*}{t*} is given by
+#' This measure has two dimensions: (test set) observations and time points.
+#' For a specific individual \eqn{i} from the test set, with observed survival
+#' outcome \eqn{(t_i, \delta_i)} (time and censoring indicator) and predicted
+#' survival function \eqn{S_i(t)}, the *observation-wise* loss integrated across
+#' the time dimension up to the time cutoff \eqn{\tau^*}, is:
+#'
+#' \deqn{L_{ISS}(S_i, t_i, \delta_i) = \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0  \frac{S_i(\tau) \text{I}(t_i \leq \tau, \delta=1)}{G(t_i)} + \frac{(1-S_i(\tau)) \text{I}(t_i > \tau)}{G(\tau)} \ d\tau}
+#'
+#' where \eqn{G} is the Kaplan-Meier estimate of the censoring distribution.
+#'
+#' The **re-weighted ISS** (RISS) is:
+#'
+#' \deqn{L_{RISS}(S_i, t_i, \delta_i) = \delta_i \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0  \frac{S_i(\tau) \text{I}(t_i \leq \tau) + (1-S_i(\tau)) \text{I}(t_i > \tau)}{G(t_i)} \ d\tau}
+#'
+#' which is always weighted by \eqn{G(t_i)} and is equal to zero for a censored subject.
+#'
+#' To get a single score across all \eqn{N} observations of the test set, we
+#' return the average of the time-integrated observation-wise scores:
+#' \deqn{\sum_{i=1}^N L(S_i, t_i, \delta_i) / N}
+#'
+#'
 #' \deqn{L_{ISS}(S,t|t^*) = [(S(t^*))I(t \le t^*, \delta = 1)(1/G(t))] + [((1 - S(t^*)))I(t > t^*)(1/G(t^*))]}
 #' where \eqn{G} is the Kaplan-Meier estimate of the censoring distribution.
 #'
 #' The re-weighted ISS, RISS is given by
 #' \deqn{L_{RISS}(S,t|t^*) = [(S(t^*))I(t \le t^*, \delta = 1)(1/G(t))] + [((1 - S(t^*)))I(t > t^*)(1/G(t))]}
-#' where \eqn{G} is the Kaplan-Meier estimate of the censoring distribution, i.e. always
-#' weighted by \eqn{G(t)}. RISS is strictly proper when the censoring distribution is independent
-#' of the survival distribution and when G is fit on a sufficiently large dataset. ISS is never
-#' proper. Use `proper = FALSE` for ISS and `proper = TRUE` for RISS.
-#' Results may be very different if many observations are censored at the last
-#' observed time due to division by 1/`eps` in `proper = TRUE`.
 #'
+#' @template properness
+#' @templateVar improper_id ISS
+#' @templateVar proper_id RISS
+#' @template which_times
+#' @template details_method
 #' @template details_trainG
 #' @template details_tmax
 #'

diff --git a/man/mlr_measures_surv.graf.Rd b/man/mlr_measures_surv.graf.Rd
diff --git a/man/mlr_measures_surv.intlogloss.Rd b/man/mlr_measures_surv.intlogloss.Rd