diff --git a/R/MeasureSurvGraf.R b/R/MeasureSurvGraf.R index 69b288f8..2ef60545 100644 --- a/R/MeasureSurvGraf.R +++ b/R/MeasureSurvGraf.R @@ -20,10 +20,10 @@ #' #' @details #' This measure has two dimensions: (test set) observations and time points. -#' For a specific individual \eqn{i}, with observed survival outcome \eqn{(t_i, \delta_i)} -#' (time and censoring indicator) and predicted survival function \eqn{S_i(t)}, the -#' *observation-wise* loss integrated across the time dimension up to the -#' time cutoff \eqn{\tau^*}, is: +#' For a specific individual \eqn{i} from the test set, with observed survival +#' outcome \eqn{(t_i, \delta_i)} (time and censoring indicator) and predicted +#' survival function \eqn{S_i(t)}, the *observation-wise* loss integrated across +#' the time dimension up to the time cutoff \eqn{\tau^*}, is: #' #' \deqn{L_{ISBS}(S_i, t_i, \delta_i) = \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0 \frac{S_i^2(\tau) \text{I}(t_i \leq \tau, \delta=1)}{G(t_i)} + \frac{(1-S_i(\tau))^2 \text{I}(t_i > \tau)}{G(\tau)} \ d\tau} #' @@ -33,14 +33,16 @@ #' #' \deqn{L_{RISBS}(S_i, t_i, \delta_i) = \delta_i \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0 \frac{S_i^2(\tau) \text{I}(t_i \leq \tau) + (1-S_i(\tau))^2 \text{I}(t_i > \tau)}{G(t_i)} \ d\tau} #' -#' which is always weighted by \eqn{G(t_i)} and removes the censored observations. +#' which is always weighted by \eqn{G(t_i)} and is equal to zero for a censored subject. #' -#' RISBS is strictly proper when the censoring distribution is independent -#' of the survival distribution and when \eqn{G(t)} is fit on a sufficiently large dataset. -#' ISBS is never proper. Use `proper = FALSE` for ISBS and `proper = TRUE` for RISBS. -#' Results may be very different if many observations are -#' censored at the last observed time due to division by \eqn{1/eps} in `proper = TRUE`. +#' To get a single score across all \eqn{N} observations of the test set, we +#' return the average of the time-integrated observation-wise scores: +#' \deqn{\sum_{i=1}^N L(S_i, t_i, \delta_i) / N} #' +#' @template properness +#' @templateVar improper_id ISBS +#' @templateVar proper_id RISBS +#' @template which_times #' @template details_method #' @template details_trainG #' @template details_tmax diff --git a/R/MeasureSurvIntLogloss.R b/R/MeasureSurvIntLogloss.R index c47cca2b..0b5ce26f 100644 --- a/R/MeasureSurvIntLogloss.R +++ b/R/MeasureSurvIntLogloss.R @@ -17,22 +17,31 @@ #' Logarithmic (log) Loss, aka integrated cross entropy. #' #' @details -#' For an individual who dies at time \eqn{t}, with predicted Survival function, \eqn{S}, the -#' probabilistic log loss at time \eqn{t^*}{t*} is given by -#' \deqn{L_{ISLL}(S,t|t^*) = - [log(1 - S(t^*))I(t \le t^*, \delta = 1)(1/G(t))] - [log(S(t^*))I(t > t^*)(1/G(t^*))]} +#' This measure has two dimensions: (test set) observations and time points. +#' For a specific individual \eqn{i} from the test set, with observed survival +#' outcome \eqn{(t_i, \delta_i)} (time and censoring indicator) and predicted +#' survival function \eqn{S_i(t)}, the *observation-wise* loss integrated across +#' the time dimension up to the time cutoff \eqn{\tau^*}, is: +#' +#' \deqn{L_{ISLL}(S_i, t_i, \delta_i) = -\text{I}(t_i \leq \tau^*) \int^{\tau^*}_0 \frac{log[1-S_i(\tau)] \text{I}(t_i \leq \tau, \delta=1)}{G(t_i)} + \frac{\log[S_i(\tau)] \text{I}(t_i > \tau)}{G(\tau)} \ d\tau} +#' #' where \eqn{G} is the Kaplan-Meier estimate of the censoring distribution. #' -#' The re-weighted ISLL, RISLL is given by -#' \deqn{L_{RISLL}(S,t|t^*) = - [log(1 - S(t^*))I(t \le t^*, \delta = 1)(1/G(t))] - [log(S(t^*))I(t > t^*)(1/G(t))]} -#' where \eqn{G} is the Kaplan-Meier estimate of the censoring distribution, i.e. always -#' weighted by \eqn{G(t)}. -#' RISLL is strictly proper when the censoring distribution is independent -#' of the survival distribution and when G is fit on a sufficiently large dataset. -#' ISLL is never proper. -#' Use `proper = FALSE` for ISLL and `proper = TRUE` for RISLL. -#' Results may be very different if many observations are censored at the last -#' observed time due to division by 1/`eps` in `proper = TRUE`. +#' The **re-weighted ISLL** (RISLL) is: +#' +#' \deqn{L_{RISLL}(S_i, t_i, \delta_i) = -\delta_i \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0 \frac{\log[1-S_i(\tau)]) \text{I}(t_i \leq \tau) + \log[S_i(\tau)] \text{I}(t_i > \tau)}{G(t_i)} \ d\tau} +#' +#' which is always weighted by \eqn{G(t_i)} and is equal to zero for a censored subject. +#' +#' To get a single score across all \eqn{N} observations of the test set, we +#' return the average of the time-integrated observation-wise scores: +#' \deqn{\sum_{i=1}^N L(S_i, t_i, \delta_i) / N} #' +#' @template properness +#' @templateVar improper_id ISLL +#' @templateVar proper_id RISLL +#' @template which_times +#' @template details_method #' @template details_trainG #' @template details_tmax #' diff --git a/R/MeasureSurvLogloss.R b/R/MeasureSurvLogloss.R index 69698c45..2e2d11ac 100644 --- a/R/MeasureSurvLogloss.R +++ b/R/MeasureSurvLogloss.R @@ -16,7 +16,7 @@ #' The Log Loss, in the context of probabilistic predictions, is defined as the #' negative log probability density function, \eqn{f}, evaluated at the #' observation time (event or censoring), \eqn{t}, -#' \deqn{L_{NLL}(f, t) = -log(f(t))} +#' \deqn{L_{NLL}(f, t) = -\log[f(t)]} #' #' The standard error of the Log Loss, L, is approximated via, #' \deqn{se(L) = sd(L)/\sqrt{N}}{se(L) = sd(L)/\sqrt N} diff --git a/R/MeasureSurvSchmid.R b/R/MeasureSurvSchmid.R index c108ce7b..1773dfd2 100644 --- a/R/MeasureSurvSchmid.R +++ b/R/MeasureSurvSchmid.R @@ -16,20 +16,38 @@ #' Calculates the **Integrated Schmid Score** (ISS), aka integrated absolute loss. #' #' @details -#' For an individual who dies at time \eqn{t}, with predicted Survival function, \eqn{S}, the -#' Schmid Score at time \eqn{t^*}{t*} is given by +#' This measure has two dimensions: (test set) observations and time points. +#' For a specific individual \eqn{i} from the test set, with observed survival +#' outcome \eqn{(t_i, \delta_i)} (time and censoring indicator) and predicted +#' survival function \eqn{S_i(t)}, the *observation-wise* loss integrated across +#' the time dimension up to the time cutoff \eqn{\tau^*}, is: +#' +#' \deqn{L_{ISS}(S_i, t_i, \delta_i) = \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0 \frac{S_i(\tau) \text{I}(t_i \leq \tau, \delta=1)}{G(t_i)} + \frac{(1-S_i(\tau)) \text{I}(t_i > \tau)}{G(\tau)} \ d\tau} +#' +#' where \eqn{G} is the Kaplan-Meier estimate of the censoring distribution. +#' +#' The **re-weighted ISS** (RISS) is: +#' +#' \deqn{L_{RISS}(S_i, t_i, \delta_i) = \delta_i \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0 \frac{S_i(\tau) \text{I}(t_i \leq \tau) + (1-S_i(\tau)) \text{I}(t_i > \tau)}{G(t_i)} \ d\tau} +#' +#' which is always weighted by \eqn{G(t_i)} and is equal to zero for a censored subject. +#' +#' To get a single score across all \eqn{N} observations of the test set, we +#' return the average of the time-integrated observation-wise scores: +#' \deqn{\sum_{i=1}^N L(S_i, t_i, \delta_i) / N} +#' +#' #' \deqn{L_{ISS}(S,t|t^*) = [(S(t^*))I(t \le t^*, \delta = 1)(1/G(t))] + [((1 - S(t^*)))I(t > t^*)(1/G(t^*))]} #' where \eqn{G} is the Kaplan-Meier estimate of the censoring distribution. #' #' The re-weighted ISS, RISS is given by #' \deqn{L_{RISS}(S,t|t^*) = [(S(t^*))I(t \le t^*, \delta = 1)(1/G(t))] + [((1 - S(t^*)))I(t > t^*)(1/G(t))]} -#' where \eqn{G} is the Kaplan-Meier estimate of the censoring distribution, i.e. always -#' weighted by \eqn{G(t)}. RISS is strictly proper when the censoring distribution is independent -#' of the survival distribution and when G is fit on a sufficiently large dataset. ISS is never -#' proper. Use `proper = FALSE` for ISS and `proper = TRUE` for RISS. -#' Results may be very different if many observations are censored at the last -#' observed time due to division by 1/`eps` in `proper = TRUE`. #' +#' @template properness +#' @templateVar improper_id ISS +#' @templateVar proper_id RISS +#' @template which_times +#' @template details_method #' @template details_trainG #' @template details_tmax #' diff --git a/man/mlr_measures_surv.graf.Rd b/man/mlr_measures_surv.graf.Rd index e03d8182..4d2924df 100644 --- a/man/mlr_measures_surv.graf.Rd +++ b/man/mlr_measures_surv.graf.Rd @@ -12,10 +12,10 @@ or squared survival loss. } \details{ This measure has two dimensions: (test set) observations and time points. -For a specific individual \eqn{i}, with observed survival outcome \eqn{(t_i, \delta_i)} -(time and censoring indicator) and predicted survival function \eqn{S_i(t)}, the -\emph{observation-wise} loss integrated across the time dimension up to the -time cutoff \eqn{\tau^*}, is: +For a specific individual \eqn{i} from the test set, with observed survival +outcome \eqn{(t_i, \delta_i)} (time and censoring indicator) and predicted +survival function \eqn{S_i(t)}, the \emph{observation-wise} loss integrated across +the time dimension up to the time cutoff \eqn{\tau^*}, is: \deqn{L_{ISBS}(S_i, t_i, \delta_i) = \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0 \frac{S_i^2(\tau) \text{I}(t_i \leq \tau, \delta=1)}{G(t_i)} + \frac{(1-S_i(\tau))^2 \text{I}(t_i > \tau)}{G(\tau)} \ d\tau} @@ -25,13 +25,11 @@ The \strong{re-weighted ISBS} (RISBS) is: \deqn{L_{RISBS}(S_i, t_i, \delta_i) = \delta_i \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0 \frac{S_i^2(\tau) \text{I}(t_i \leq \tau) + (1-S_i(\tau))^2 \text{I}(t_i > \tau)}{G(t_i)} \ d\tau} -which is always weighted by \eqn{G(t_i)} and removes the censored observations. +which is always weighted by \eqn{G(t_i)} and is equal to zero for a censored subject. -RISBS is strictly proper when the censoring distribution is independent -of the survival distribution and when \eqn{G(t)} is fit on a sufficiently large dataset. -ISBS is never proper. Use \code{proper = FALSE} for ISBS and \code{proper = TRUE} for RISBS. -Results may be very different if many observations are -censored at the last observed time due to division by \eqn{1/eps} in \code{proper = TRUE}. +To get a single score across all \eqn{N} observations of the test set, we +return the average of the time-integrated observation-wise scores: +\deqn{\sum_{i=1}^N L(S_i, t_i, \delta_i) / N} } \section{Dictionary}{ @@ -87,11 +85,12 @@ If \code{integrated == FALSE} then a single time point at which to return the sc \itemize{ \item \code{t_max} (\code{numeric(1)})\cr -Cutoff time (i.e. time horizon) to evaluate the measure up to. +Cutoff time \eqn{\tau^*} (i.e. time horizon) to evaluate the measure up to. Mutually exclusive with \code{p_max} or \code{times}. This will effectively remove test observations for which the observed time (event or censoring) is strictly more than \code{t_max}. It's recommended to set \code{t_max} to avoid division by \code{eps}, see Details. +If \code{t_max} is not specified, an \code{Inf} time horizon is assumed. } @@ -149,7 +148,41 @@ Default is \code{FALSE}. } } -\section{Implementation differences (time-integration)}{ +\section{Properness}{ + + +RISBS is strictly proper when the censoring distribution is independent +of the survival distribution and when \eqn{G(t)} is fit on a sufficiently large dataset. +ISBS is never proper. Use \code{proper = FALSE} for ISBS and +\code{proper = TRUE} for RISBS. +Results may be very different if many observations are censored at the last +observed time due to division by \eqn{1/eps} in \code{proper = TRUE}. +} + +\section{Time points used for evaluation}{ + +If the \code{times} argument is not specified (\code{NULL}), then the unique (and +sorted) time points from the \strong{test set} are used for evaluation of the +time-integrated score. +This was a design decision due to the fact that different predicted survival +distributions \eqn{S(t)} usually have a \strong{discretized time domain} which may +differ, i.e. in the case the survival predictions come from different survival +learners. +Essentially, using the same set of time points for the calculation of the score +minimizes the bias that would come from using different time points. +We note that \eqn{S(t)} is by default constantly interpolated for time points that fall +outside its discretized time domain. + +Naturally, if the \code{times} argument is specified, then exactly these time +points are used for evaluation. +A warning is given to the user in case some of the specified \code{times} fall outside +of the time point range of the test set. +The assumption here is that if the test set is large enough, it should have a +time domain/range similar to the one from the train set, and therefore time +points outside that domain might lead to interpolation or extrapolation of \eqn{S(t)}. +} + +\section{Implementation differences}{ If comparing the integrated graf score to other packages, e.g. diff --git a/man/mlr_measures_surv.intlogloss.Rd b/man/mlr_measures_surv.intlogloss.Rd index f827c343..5a6d7af8 100644 --- a/man/mlr_measures_surv.intlogloss.Rd +++ b/man/mlr_measures_surv.intlogloss.Rd @@ -9,34 +9,25 @@ Calculates the \strong{Integrated Survival Log-Likelihood} (ISLL) or Integrated Logarithmic (log) Loss, aka integrated cross entropy. } \details{ -For an individual who dies at time \eqn{t}, with predicted Survival function, \eqn{S}, the -probabilistic log loss at time \eqn{t^*}{t*} is given by -\deqn{L_{ISLL}(S,t|t^*) = - [log(1 - S(t^*))I(t \le t^*, \delta = 1)(1/G(t))] - [log(S(t^*))I(t > t^*)(1/G(t^*))]} +This measure has two dimensions: (test set) observations and time points. +For a specific individual \eqn{i} from the test set, with observed survival +outcome \eqn{(t_i, \delta_i)} (time and censoring indicator) and predicted +survival function \eqn{S_i(t)}, the \emph{observation-wise} loss integrated across +the time dimension up to the time cutoff \eqn{\tau^*}, is: + +\deqn{L_{ISLL}(S_i, t_i, \delta_i) = -\text{I}(t_i \leq \tau^*) \int^{\tau^*}_0 \frac{log[1-S_i(\tau)] \text{I}(t_i \leq \tau, \delta=1)}{G(t_i)} + \frac{\log[S_i(\tau)] \text{I}(t_i > \tau)}{G(\tau)} \ d\tau} + where \eqn{G} is the Kaplan-Meier estimate of the censoring distribution. -The re-weighted ISLL, RISLL is given by -\deqn{L_{RISLL}(S,t|t^*) = - [log(1 - S(t^*))I(t \le t^*, \delta = 1)(1/G(t))] - [log(S(t^*))I(t > t^*)(1/G(t))]} -where \eqn{G} is the Kaplan-Meier estimate of the censoring distribution, i.e. always -weighted by \eqn{G(t)}. -RISLL is strictly proper when the censoring distribution is independent -of the survival distribution and when G is fit on a sufficiently large dataset. -ISLL is never proper. -Use \code{proper = FALSE} for ISLL and \code{proper = TRUE} for RISLL. -Results may be very different if many observations are censored at the last -observed time due to division by 1/\code{eps} in \code{proper = TRUE}. +The \strong{re-weighted ISLL} (RISLL) is: -If \code{task} and \code{train_set} are passed to \verb{$score} then \eqn{G(t)} is fit on training data, -otherwise testing data. The first is likely to reduce any bias caused by calculating -parts of the measure on the test data it is evaluating. The training data is automatically -used in scoring resamplings. +\deqn{L_{RISLL}(S_i, t_i, \delta_i) = -\delta_i \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0 \frac{\log[1-S_i(\tau)]) \text{I}(t_i \leq \tau) + \log[S_i(\tau)] \text{I}(t_i > \tau)}{G(t_i)} \ d\tau} -If \code{t_max} or \code{p_max} is given, then \eqn{G(t)} will be fitted using \strong{all observations} from the -train set (or test set) and only then the cutoff time will be applied. -This is to ensure that more data is used for fitting the censoring distribution via the -Kaplan-Meier. -Setting the \code{t_max} can help alleviate inflation of the score when \code{proper} is \code{TRUE}, -in cases where an observation is censored at the last observed time point. -This results in \eqn{G(t_{max}) = 0} and the use of \code{eps} instead (when \code{t_max} is \code{NULL}). +which is always weighted by \eqn{G(t_i)} and is equal to zero for a censored subject. + +To get a single score across all \eqn{N} observations of the test set, we +return the average of the time-integrated observation-wise scores: +\deqn{\sum_{i=1}^N L(S_i, t_i, \delta_i) / N} } \section{Dictionary}{ @@ -92,11 +83,12 @@ If \code{integrated == FALSE} then a single time point at which to return the sc \itemize{ \item \code{t_max} (\code{numeric(1)})\cr -Cutoff time (i.e. time horizon) to evaluate the measure up to. +Cutoff time \eqn{\tau^*} (i.e. time horizon) to evaluate the measure up to. Mutually exclusive with \code{p_max} or \code{times}. This will effectively remove test observations for which the observed time (event or censoring) is strictly more than \code{t_max}. It's recommended to set \code{t_max} to avoid division by \code{eps}, see Details. +If \code{t_max} is not specified, an \code{Inf} time horizon is assumed. } @@ -154,6 +146,72 @@ Default is \code{FALSE}. } } +\section{Properness}{ + + +RISLL is strictly proper when the censoring distribution is independent +of the survival distribution and when \eqn{G(t)} is fit on a sufficiently large dataset. +ISLL is never proper. Use \code{proper = FALSE} for ISLL and +\code{proper = TRUE} for RISLL. +Results may be very different if many observations are censored at the last +observed time due to division by \eqn{1/eps} in \code{proper = TRUE}. +} + +\section{Time points used for evaluation}{ + +If the \code{times} argument is not specified (\code{NULL}), then the unique (and +sorted) time points from the \strong{test set} are used for evaluation of the +time-integrated score. +This was a design decision due to the fact that different predicted survival +distributions \eqn{S(t)} usually have a \strong{discretized time domain} which may +differ, i.e. in the case the survival predictions come from different survival +learners. +Essentially, using the same set of time points for the calculation of the score +minimizes the bias that would come from using different time points. +We note that \eqn{S(t)} is by default constantly interpolated for time points that fall +outside its discretized time domain. + +Naturally, if the \code{times} argument is specified, then exactly these time +points are used for evaluation. +A warning is given to the user in case some of the specified \code{times} fall outside +of the time point range of the test set. +The assumption here is that if the test set is large enough, it should have a +time domain/range similar to the one from the train set, and therefore time +points outside that domain might lead to interpolation or extrapolation of \eqn{S(t)}. +} + +\section{Implementation differences}{ + + +If comparing the integrated graf score to other packages, e.g. +\CRANpkg{pec}, then \code{method = 2} should be used. However the results may +still be very slightly different as this package uses \code{survfit} to estimate +the censoring distribution, in line with the Graf 1999 paper; whereas some +other packages use \code{prodlim} with \code{reverse = TRUE} (meaning Kaplan-Meier is +not used). +} + +\section{Data used for Estimating Censoring Distribution}{ + + +If \code{task} and \code{train_set} are passed to \verb{$score} then \eqn{G(t)} is fit on training data, +otherwise testing data. The first is likely to reduce any bias caused by calculating +parts of the measure on the test data it is evaluating. The training data is automatically +used in scoring resamplings. +} + +\section{Time Cutoff Details}{ + + +If \code{t_max} or \code{p_max} is given, then \eqn{G(t)} will be fitted using \strong{all observations} from the +train set (or test set) and only then the cutoff time will be applied. +This is to ensure that more data is used for fitting the censoring distribution via the +Kaplan-Meier. +Setting the \code{t_max} can help alleviate inflation of the score when \code{proper} is \code{TRUE}, +in cases where an observation is censored at the last observed time point. +This results in \eqn{G(t_{max}) = 0} and the use of \code{eps} instead (when \code{t_max} is \code{NULL}). +} + \references{ Graf E, Schmoor C, Sauerbrei W, Schumacher M (1999). \dQuote{Assessment and comparison of prognostic classification schemes for survival data.} diff --git a/man/mlr_measures_surv.logloss.Rd b/man/mlr_measures_surv.logloss.Rd index 29eb2061..24536475 100644 --- a/man/mlr_measures_surv.logloss.Rd +++ b/man/mlr_measures_surv.logloss.Rd @@ -11,25 +11,20 @@ Calculates the cross-entropy, or negative log-likelihood (NLL) or logarithmic (l The Log Loss, in the context of probabilistic predictions, is defined as the negative log probability density function, \eqn{f}, evaluated at the observation time (event or censoring), \eqn{t}, -\deqn{L_{NLL}(f, t) = -log(f(t))} +\deqn{L_{NLL}(f, t) = -\log[f(t)]} The standard error of the Log Loss, L, is approximated via, \deqn{se(L) = sd(L)/\sqrt{N}}{se(L) = sd(L)/\sqrt N} where \eqn{N} are the number of observations in the test set, and \eqn{sd} is the standard deviation. -The \strong{Re-weighted Negative Log-Likelihood} (RNLL) or IPCW Log Loss is defined by -\deqn{L_{RNLL}(f, t, \Delta) = -\Delta log(f(t))/G(t)} -where \eqn{\Delta} is the censoring indicator and G is the Kaplan-Meier estimator of the +The \strong{Re-weighted Negative Log-Likelihood} (RNLL) or IPCW (Inverse Probability Censoring Weighted) Log Loss is defined by +\deqn{L_{RNLL}(f, t, \delta) = - \frac{\delta \log[f(t)]}{G(t)}} +where \eqn{\delta} is the censoring indicator and \eqn{G(t)} is the Kaplan-Meier estimator of the censoring distribution. So only observations that have experienced the event are taking into account -for RNLL and both \eqn{f(t), G(t)} are calculated only at the event times. +for RNLL (i.e. \eqn{\delta = 1}) and both \eqn{f(t), G(t)} are calculated only at the event times. If only censored observations exist in the test set, \code{NaN} is returned. - -If \code{task} and \code{train_set} are passed to \verb{$score} then \eqn{G(t)} is fit on training data, -otherwise testing data. The first is likely to reduce any bias caused by calculating -parts of the measure on the test data it is evaluating. The training data is automatically -used in scoring resamplings. } \section{Dictionary}{ @@ -95,6 +90,15 @@ If \code{TRUE} (default) then returns the \eqn{L_{RNLL}} score (which is proper) } } +\section{Data used for Estimating Censoring Distribution}{ + + +If \code{task} and \code{train_set} are passed to \verb{$score} then \eqn{G(t)} is fit on training data, +otherwise testing data. The first is likely to reduce any bias caused by calculating +parts of the measure on the test data it is evaluating. The training data is automatically +used in scoring resamplings. +} + \seealso{ Other survival measures: \code{\link{mlr_measures_surv.calib_alpha}}, diff --git a/man/mlr_measures_surv.schmid.Rd b/man/mlr_measures_surv.schmid.Rd index 811c2c08..d01cd26e 100644 --- a/man/mlr_measures_surv.schmid.Rd +++ b/man/mlr_measures_surv.schmid.Rd @@ -8,32 +8,31 @@ Calculates the \strong{Integrated Schmid Score} (ISS), aka integrated absolute loss. } \details{ -For an individual who dies at time \eqn{t}, with predicted Survival function, \eqn{S}, the -Schmid Score at time \eqn{t^*}{t*} is given by +This measure has two dimensions: (test set) observations and time points. +For a specific individual \eqn{i} from the test set, with observed survival +outcome \eqn{(t_i, \delta_i)} (time and censoring indicator) and predicted +survival function \eqn{S_i(t)}, the \emph{observation-wise} loss integrated across +the time dimension up to the time cutoff \eqn{\tau^*}, is: + +\deqn{L_{ISS}(S_i, t_i, \delta_i) = \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0 \frac{S_i(\tau) \text{I}(t_i \leq \tau, \delta=1)}{G(t_i)} + \frac{(1-S_i(\tau)) \text{I}(t_i > \tau)}{G(\tau)} \ d\tau} + +where \eqn{G} is the Kaplan-Meier estimate of the censoring distribution. + +The \strong{re-weighted ISS} (RISS) is: + +\deqn{L_{RISS}(S_i, t_i, \delta_i) = \delta_i \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0 \frac{S_i(\tau) \text{I}(t_i \leq \tau) + (1-S_i(\tau)) \text{I}(t_i > \tau)}{G(t_i)} \ d\tau} + +which is always weighted by \eqn{G(t_i)} and is equal to zero for a censored subject. + +To get a single score across all \eqn{N} observations of the test set, we +return the average of the time-integrated observation-wise scores: +\deqn{\sum_{i=1}^N L(S_i, t_i, \delta_i) / N} + \deqn{L_{ISS}(S,t|t^*) = [(S(t^*))I(t \le t^*, \delta = 1)(1/G(t))] + [((1 - S(t^*)))I(t > t^*)(1/G(t^*))]} where \eqn{G} is the Kaplan-Meier estimate of the censoring distribution. The re-weighted ISS, RISS is given by \deqn{L_{RISS}(S,t|t^*) = [(S(t^*))I(t \le t^*, \delta = 1)(1/G(t))] + [((1 - S(t^*)))I(t > t^*)(1/G(t))]} -where \eqn{G} is the Kaplan-Meier estimate of the censoring distribution, i.e. always -weighted by \eqn{G(t)}. RISS is strictly proper when the censoring distribution is independent -of the survival distribution and when G is fit on a sufficiently large dataset. ISS is never -proper. Use \code{proper = FALSE} for ISS and \code{proper = TRUE} for RISS. -Results may be very different if many observations are censored at the last -observed time due to division by 1/\code{eps} in \code{proper = TRUE}. - -If \code{task} and \code{train_set} are passed to \verb{$score} then \eqn{G(t)} is fit on training data, -otherwise testing data. The first is likely to reduce any bias caused by calculating -parts of the measure on the test data it is evaluating. The training data is automatically -used in scoring resamplings. - -If \code{t_max} or \code{p_max} is given, then \eqn{G(t)} will be fitted using \strong{all observations} from the -train set (or test set) and only then the cutoff time will be applied. -This is to ensure that more data is used for fitting the censoring distribution via the -Kaplan-Meier. -Setting the \code{t_max} can help alleviate inflation of the score when \code{proper} is \code{TRUE}, -in cases where an observation is censored at the last observed time point. -This results in \eqn{G(t_{max}) = 0} and the use of \code{eps} instead (when \code{t_max} is \code{NULL}). } \section{Dictionary}{ @@ -89,11 +88,12 @@ If \code{integrated == FALSE} then a single time point at which to return the sc \itemize{ \item \code{t_max} (\code{numeric(1)})\cr -Cutoff time (i.e. time horizon) to evaluate the measure up to. +Cutoff time \eqn{\tau^*} (i.e. time horizon) to evaluate the measure up to. Mutually exclusive with \code{p_max} or \code{times}. This will effectively remove test observations for which the observed time (event or censoring) is strictly more than \code{t_max}. It's recommended to set \code{t_max} to avoid division by \code{eps}, see Details. +If \code{t_max} is not specified, an \code{Inf} time horizon is assumed. } @@ -151,6 +151,72 @@ Default is \code{FALSE}. } } +\section{Properness}{ + + +RISS is strictly proper when the censoring distribution is independent +of the survival distribution and when \eqn{G(t)} is fit on a sufficiently large dataset. +ISS is never proper. Use \code{proper = FALSE} for ISS and +\code{proper = TRUE} for RISS. +Results may be very different if many observations are censored at the last +observed time due to division by \eqn{1/eps} in \code{proper = TRUE}. +} + +\section{Time points used for evaluation}{ + +If the \code{times} argument is not specified (\code{NULL}), then the unique (and +sorted) time points from the \strong{test set} are used for evaluation of the +time-integrated score. +This was a design decision due to the fact that different predicted survival +distributions \eqn{S(t)} usually have a \strong{discretized time domain} which may +differ, i.e. in the case the survival predictions come from different survival +learners. +Essentially, using the same set of time points for the calculation of the score +minimizes the bias that would come from using different time points. +We note that \eqn{S(t)} is by default constantly interpolated for time points that fall +outside its discretized time domain. + +Naturally, if the \code{times} argument is specified, then exactly these time +points are used for evaluation. +A warning is given to the user in case some of the specified \code{times} fall outside +of the time point range of the test set. +The assumption here is that if the test set is large enough, it should have a +time domain/range similar to the one from the train set, and therefore time +points outside that domain might lead to interpolation or extrapolation of \eqn{S(t)}. +} + +\section{Implementation differences}{ + + +If comparing the integrated graf score to other packages, e.g. +\CRANpkg{pec}, then \code{method = 2} should be used. However the results may +still be very slightly different as this package uses \code{survfit} to estimate +the censoring distribution, in line with the Graf 1999 paper; whereas some +other packages use \code{prodlim} with \code{reverse = TRUE} (meaning Kaplan-Meier is +not used). +} + +\section{Data used for Estimating Censoring Distribution}{ + + +If \code{task} and \code{train_set} are passed to \verb{$score} then \eqn{G(t)} is fit on training data, +otherwise testing data. The first is likely to reduce any bias caused by calculating +parts of the measure on the test data it is evaluating. The training data is automatically +used in scoring resamplings. +} + +\section{Time Cutoff Details}{ + + +If \code{t_max} or \code{p_max} is given, then \eqn{G(t)} will be fitted using \strong{all observations} from the +train set (or test set) and only then the cutoff time will be applied. +This is to ensure that more data is used for fitting the censoring distribution via the +Kaplan-Meier. +Setting the \code{t_max} can help alleviate inflation of the score when \code{proper} is \code{TRUE}, +in cases where an observation is censored at the last observed time point. +This results in \eqn{G(t_{max}) = 0} and the use of \code{eps} instead (when \code{t_max} is \code{NULL}). +} + \references{ Schemper, Michael, Henderson, Robin (2000). \dQuote{Predictive Accuracy and Explained Variation in Cox Regression.}