Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: alternative approach for censored data #1657

Open
hansvancalster opened this issue May 19, 2024 · 28 comments
Open

feature request: alternative approach for censored data #1657

hansvancalster opened this issue May 19, 2024 · 28 comments
Labels

Comments

@hansvancalster
Copy link

Context:

I often work with data that are censored. The approach for censored data implemented in brms is really useful, but I have found it to be slow - especially when a large part of the data are censored and with distributions where it is costly to integrate out censored data.
The approach taken by brms is explained in the stan user guide: https://mc-stan.org/docs/stan-users-guide/truncation-censoring.html#integrating-out-censored-values

This involves integrating out the PDF between the censoring bounds and therefore requires the CDF and/or CCDF functions depending on left, right or interval censoring.

Alternative:

The stan user guide also mentions an alternative approach: https://mc-stan.org/docs/stan-users-guide/truncation-censoring.html#estimating-censored-values

I have found this approach to be much faster (about 30 times faster in below reprex) and it recovers the same parameter estimates. In this approach, the censored data are declared as parameters to be estimated. I think this approach is called a data augmentation (imputation) approach and is discussed in https://academic.oup.com/jrsssb/article-abstract/55/1/3/7035892 (section 6.3). It only requires the PDF function.

Notes:

The below reprex is for a beta distribution. I have extracted the brms generated stan code and changed this according to the User's guide, but without changing how brms requires the data to be formatted (so standata() is the same in both implementations). I am not very familiar with programming in stan, so some further efficiency gains are probably possible.

I have also tested this with a Gamma distribution and this worked too, but the speed-up is not that spectacular. I can imagine that for a Gaussian distribution there would be no speed-up.

Feature request:

Would you be willing to consider implementing this approach in brms? Maybe not as a replacement to the current implementation, but as an alternative. Perhaps an extra argument in the cens() function to switch between both alternatives?

I think this approach might not work for discrete distributions because stan cannot sample discrete parameters? Although I have a feeling this might be related: https://users.aalto.fi/~johnsoa2/notebooks/CopulaIntro.html#bayesian-estimation-data-augmentation. (haven't thought this through)

Reproducible example:

  library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.20.4). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
  library(tidyverse)
  
  ## generate data from beta distribution
  set.seed(2154)
  mu <- 0.5
  phi <- 2
  true_data <- tibble(
    y = rbeta(100, shape1 = mu * phi, shape2 = (1 - mu) * phi)
  )
  
  ## Censoring
  censored_data <- true_data %>%
    mutate(
      y_left = floor(y * 10) / 10,
      y_right = ceiling(y * 10) / 10,
      y_left = ifelse(y_left == 0, 1e-7, y_left),
      y_right = ifelse(y_right == 1, 1 - 1e-7, y_right),
      censoring = case_when(
        y_right == 0.1 ~ "left",
        y_left == 0.9 ~ "right",
        TRUE ~ "interval"
      ),
      y_cens = case_when(
        censoring == "left" ~ y_right,
        censoring == "right" ~ y_left,
        TRUE ~ y_left
      )
    )
  
  ## plot censored data
  censored_data %>%
    arrange(y) %>%
    mutate(
      row_number = row_number()
    ) %>%
    ggplot(
      aes(x = row_number, colour = censoring)) +
    geom_point(
      aes(y = y),
      position = position_jitter(width = 0.5, height = 0)) +
    geom_linerange(
      aes(y = y, ymin = y_left, ymax = y_right), 
      alpha = 0.3,
      position = position_jitter(width = 0.5, height = 0)
    )

  
  # fit brms model without sampling
  beta_censored <- brm(
    y_cens | cens(censoring, y_right) ~ 1,
    family = brms::Beta(),
    data = censored_data,
    iter = 0
  )
#> Compiling Stan program...
#> Start sampling
#> Error in config_argss(chains = chains, iter = iter, warmup = warmup, thin = thin,  : 
#>   parameter 'iter' should be a positive integer
#> error in specifying arguments; sampling not done
  # extract stan code and data
  beta_stan_code <- brms::stancode(beta_censored)
  beta_stan_data <- brms::standata(beta_censored)

  # fit using cmdstanr (for easier comparison of timing for sampling)
  writeLines(beta_stan_code, "beta_censored_brms.stan")
  beta_censored_mod <- cmdstanr::cmdstan_model("beta_censored_brms.stan")
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:356,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:14,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp4yUrzl/model-34a81bb744e7.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       |
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
#> In file included from C:/rtools43/ucrt64/include/c++/13.2.0/algorithm:61,
#>                  from stan/lib/stan_math/lib/eigen_3.4.0/Eigen/Core:95,
#>                  from stan/lib/stan_math/lib/eigen_3.4.0/Eigen/Dense:1,
#>                  from stan/lib/stan_math/stan/math/prim/fun/Eigen.hpp:22,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:4:
#> In function 'void std::__final_insertion_sort(_RandomAccessIterator, _RandomAccessIterator, _Compare) [with _RandomAccessIterator = unsigned int*; _Compare = __gnu_cxx::__ops::_Iter_comp_iter<less<long double> >]',
#>     inlined from 'void std::__sort(_RandomAccessIterator, _RandomAccessIterator, _Compare) [with _RandomAccessIterator = unsigned int*; _Compare = __gnu_cxx::__ops::_Iter_comp_iter<less<long double> >]' at C:/rtools43/ucrt64/include/c++/13.2.0/bits/stl_algo.h:1950:31,
#>     inlined from 'void std::sort(_RAIter, _RAIter, _Compare) [with _RAIter = unsigned int*; _Compare = less<long double>]' at C:/rtools43/ucrt64/include/c++/13.2.0/bits/stl_algo.h:4894:18,
#>     inlined from 'unsigned int boost::math::detail::set_crossover_locations(const Seq&, const Seq&, const Real&, unsigned int*) [with Seq = std::vector<double, std::allocator<double> >; Real = long double]' at stan/lib/stan_math/lib/boost_1.78.0/boost/math/special_functions/detail/hypergeometric_pFq_checked_series.hpp:113:21,
#>     inlined from 'std::pair<T, T> boost::math::detail::hypergeometric_pFq_checked_series_impl(const Seq&, const Seq&, const Real&, const Policy&, const Terminal&, long long int&) [with Seq = std::vector<double, std::allocator<double> >; Real = long double; Policy = boost::math::policies::policy<boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy
#> , boost::math::policies::default_policy, boost::math::policies::default_policy>; Terminal = iteration_terminator]' at stan/lib/stan_math/lib/boost_1.78.0/boost/math/special_functions/detail/hypergeometric_pFq_checked_series.hpp:260:56:
#> C:/rtools43/ucrt64/include/c++/13.2.0/bits/stl_algo.h:1859:32: warning: array subscript 16 is outside array bounds of 'unsigned int [5]' [-Warray-bounds=]
#>  1859 |           std::__insertion_sort(__first, __first + int(_S_threshold), __comp);
#>       |           ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#> In file included from stan/lib/stan_math/lib/boost_1.78.0/boost/math/special_functions/hypergeometric_pFq.hpp:11,
#>                  from stan/lib/stan_math/stan/math/prim/fun/hypergeometric_pFq.hpp:7,
#>                  from stan/lib/stan_math/stan/math/prim/fun/hypergeometric_2F1.hpp:18,
#>                  from stan/lib/stan_math/stan/math/prim/fun/grad_2F1.hpp:14,
#>                  from stan/lib/stan_math/stan/math/prim/fun.hpp:129,
#>                  from stan/lib/stan_math/stan/math/rev/fun/multiply.hpp:7,
#>                  from stan/lib/stan_math/stan/math/rev/fun/elt_multiply.hpp:9,
#>                  from stan/lib/stan_math/stan/math/rev/fun.hpp:56,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:10:
#> stan/lib/stan_math/lib/boost_1.78.0/boost/math/special_functions/detail/hypergeometric_pFq_checked_series.hpp: In function 'std::pair<T, T> boost::math::detail::hypergeometric_pFq_checked_series_impl(const Seq&, const Seq&, const Real&, const Policy&, const Terminal&, long long int&) [with Seq = std::vector<double, std::allocator<double> >; Real = long double; Policy = boost::math::policies::policy<boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy, boost::math::policies::default_policy>; Terminal = iteration_terminator]':
#> stan/lib/stan_math/lib/boost_1.78.0/boost/math/special_functions/detail/hypergeometric_pFq_checked_series.hpp:258:18: note: at offset 64 into object 'crossover_locations' of size 20
#>   258 |         unsigned crossover_locations[BOOST_MATH_PFQ_MAX_B_TERMS];
#>       |                  ^~~~~~~~~~~~~~~~~~~
  beta_integrated_out <- beta_censored_mod$sample(
    data = beta_stan_data,
    seed = 123,
    chains = 4,
    parallel_chains = 1
  )
#> Running MCMC with 4 sequential chains...
#> 
#> Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: beta_lccdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp4yUrzl/model-34a81bb744e7.stan', line 34, column 8 to column 68)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 1 finished in 64.4 seconds.
#> Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: gamma_lpdf: Random variable is inf, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp4yUrzl/model-34a81bb744e7.stan', line 20, column 2 to column 41)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 2 finished in 73.5 seconds.
#> Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: gamma_lpdf: Random variable is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp4yUrzl/model-34a81bb744e7.stan', line 20, column 2 to column 41)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: beta_lccdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp4yUrzl/model-34a81bb744e7.stan', line 34, column 8 to column 68)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 3 finished in 56.5 seconds.
#> Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 4 Exception: gamma_lpdf: Random variable is inf, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp4yUrzl/model-34a81bb744e7.stan', line 20, column 2 to column 41)
#> Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 4
#> Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 4 finished in 67.5 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 65.5 seconds.
#> Total execution time: 262.7 seconds.
  
  # changed brms generated stan code to use data augmentation approach
  # rather than integrating out (i.e. avoid use of CDF or CCDF)
  stanlines <- "
  // generated with brms 2.20.4
functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  array[N] int<lower=-1,upper=2> cens;  // indicates censoring
  vector[N] rcens;  // right censor points for interval censoring
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  // count uncensored data points
  int Nobs = 0;
  for (i in 1:N) {
    if (cens[i] == 0) {
      Nobs += 1;
    }
  }
  // extract uncensored response values
  array[Nobs] real Yobs;
  int index = 1;  // index to keep track of the position
  // Populate with the values from Y
  for (i in 1:N) {
    if (cens[i] == 0) {
      Yobs[index] = Y[i];
      index += 1;
    }
  }
  
  // count right censored data points
  int Nright = 0;
  for (i in 1:N) {
    if (cens[i] == 1) {
      Nright += 1;
    }
  }
  // extract right censored lower bounds
  array[Nright] real LBright;
  int index2 = 1;  // index to keep track of the position
  // Populate with the values from Y
  for (i in 1:N) {
    if (cens[i] == 1) {
      LBright[index2] = Y[i];
      index2 += 1;
    }
  }

  // count left censored data points
  int Nleft = 0;
  for (i in 1:N) {
    if (cens[i] == -1) {
      Nleft += 1;
    }
  }
  // extract left censored upper bounds
  array[Nleft] real UBleft;
  int index3 = 1;  // index to keep track of the position
  // Populate with the values from Y
  for (i in 1:N) {
    if (cens[i] == -1) {
      UBleft[index3] = Y[i];
      index3 += 1;
    }
  }

  // count interval censored data points
  int Nint = 0;
  for (i in 1:N) {
    if (cens[i] == 2) {
      Nint += 1;
    }
  }
  // extract interval censored lower and upper bounds
  array[Nint] real LBint;
  array[Nint] real UBint;
  int index4 = 1;  // index to keep track of the position
  // Populate with the values from Y
  for (i in 1:N) {
    if (cens[i] == 2) {
      LBint[index4] = Y[i];
      UBint[index4] = rcens[i];
      index4 += 1;
    }
  }
}
parameters {
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> phi;  // precision parameter
  array[Nright] real<lower=LBright, upper=1> Yright;
  array[Nleft] real<lower=0, upper=UBleft> Yleft;
  array[Nint] real<lower=LBint, upper=UBint> Yint;
}
transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  lprior += student_t_lpdf(Intercept | 3, 0, 2.5);
  lprior += gamma_lpdf(phi | 0.01, 0.01);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[Nobs] muobs = rep_vector(0.0, Nobs);
    muobs += Intercept;
    muobs = inv_logit(muobs);
    for (n in 1:Nobs) {
      target += beta_lpdf(Yobs[n] | muobs[n] * phi, (1 - muobs[n]) * phi);
    }
    vector[Nleft] muleft = rep_vector(0.0, Nleft);
    muleft += Intercept;
    muleft = inv_logit(muleft);
    for (n in 1:Nleft) {
      target += beta_lpdf(Yleft[n] |  muleft[n] * phi, (1 - muleft[n]) * phi);
    }
    vector[Nright] muright = rep_vector(0.0, Nright);
    muright += Intercept;
    muright = inv_logit(muright);
    for (n in 1:Nright) {
      target += beta_lpdf(Yright[n] |  muright[n] * phi, (1 - muright[n]) * phi);
    }
    vector[Nint] muint = rep_vector(0.0, Nint);
    muint += Intercept;
    muint = inv_logit(muint);
    for (n in 1:Nint) {
      target += beta_lpdf(Yint[n] |  muint[n] * phi, (1 - muint[n]) * phi);
    }
  }
  // priors including constants
  target += lprior;
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept;
}
"
  writeLines(stanlines, "bda.stan")
  
  # fit data augmentation model with cmdstanr
  bda <- cmdstanr::cmdstan_model("bda.stan")
#> In file included from stan/lib/stan_math/stan/math/prim/prob/von_mises_lccdf.hpp:5,
#>                  from stan/lib/stan_math/stan/math/prim/prob/von_mises_ccdf_log.hpp:4,
#>                  from stan/lib/stan_math/stan/math/prim/prob.hpp:356,
#>                  from stan/lib/stan_math/stan/math/prim.hpp:16,
#>                  from stan/lib/stan_math/stan/math/rev.hpp:14,
#>                  from stan/lib/stan_math/stan/math.hpp:19,
#>                  from stan/src/stan/model/model_header.hpp:4,
#>                  from C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp4yUrzl/model-34a854246415.hpp:2:
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp: In function 'stan::return_type_t<T_x, T_sigma, T_l> stan::math::von_mises_cdf(const T_x&, const T_mu&, const T_k&)':
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: '-Wmisleading-indentation' is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
#>   194 |       if (cdf_n < 0.0)
#>       |
#> stan/lib/stan_math/stan/math/prim/prob/von_mises_cdf.hpp:194: note: adding '-flarge-source-files' will allow for more column-tracking support, at the expense of compilation time and memory
  beta_data_augmentation <- bda$sample(
    data = beta_stan_data,
    seed = 123,
    chains = 4,
    parallel_chains = 1
  )
#> Running MCMC with 4 sequential chains...
#> 
#> Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp4yUrzl/model-34a854246415.stan', line 113, column 6 to column 78)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 1 finished in 2.0 seconds.
#> Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 2 finished in 2.0 seconds.
#> Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp4yUrzl/model-34a854246415.stan', line 113, column 6 to column 78)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 3 finished in 2.0 seconds.
#> Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 4 Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp4yUrzl/model-34a854246415.stan', line 113, column 6 to column 78)
#> Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 4
#> Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 4 finished in 2.0 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 2.0 seconds.
#> Total execution time: 8.6 seconds.
  
  # compare timings
  beta_integrated_out$time()
#> $total
#> [1] 262.7254
#> 
#> $chains
#>   chain_id warmup sampling  total
#> 1        1 28.933   35.458 64.391
#> 2        2 30.083   43.453 73.536
#> 3        3 28.928   27.539 56.467
#> 4        4 30.996   36.493 67.489
  beta_data_augmentation$time()
#> $total
#> [1] 8.63974
#> 
#> $chains
#>   chain_id warmup sampling total
#> 1        1  1.091    0.907 1.998
#> 2        2  1.126    0.914 2.040
#> 3        3  1.102    0.909 2.011
#> 4        4  1.098    0.919 2.017
  beta_data_augmentation$time()$total / beta_integrated_out$time()$total
#> [1] 0.03288506
  
  # compare estimated parameters mu and phi
  beta_integrated_out$summary(variables = c("b_Intercept", "phi"))
#> # A tibble: 2 × 10
#>   variable     mean median    sd   mad      q5   q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 b_Intercept 0.190  0.186 0.112 0.110 0.00636 0.378  1.00    2829.    2563.
#> 2 phi         2.42   2.40  0.335 0.342 1.91    2.98   1.00    3389.    2448.
  beta_data_augmentation$summary(variables = c("b_Intercept", "phi"))
#> # A tibble: 2 × 10
#>   variable     mean median    sd   mad      q5   q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 b_Intercept 0.187  0.186 0.107 0.105 0.00940 0.363  1.00    6905.    2961.
#> 2 phi         2.42   2.40  0.328 0.334 1.92    2.99   1.00    6211.    3134.
  
  # graphical comparison between both approaches
  drw_bio <- posterior::as_draws_rvars(beta_integrated_out)
  drw_bda <- posterior::as_draws_rvars(beta_data_augmentation)
  bda_mu_phi <- drw_bda %>%
    as_draws_df() %>%
    select(Intercept, phi) %>%
    mutate(
      mu = plogis(Intercept),
      model = "data augmentation"
    )
#> Warning: Dropping 'draws_df' class as required metadata was removed.
  bio_mu_phi <- drw_bio %>%
    as_draws_df() %>%
    select(Intercept, phi) %>%
    mutate(
      mu = plogis(Intercept),
      model = "integrate out (brms implementation)"
    )
#> Warning: Dropping 'draws_df' class as required metadata was removed.
  bind_rows(bio_mu_phi, bda_mu_phi) %>%
    pivot_longer(cols = c("phi", "mu")) %>%
    ggplot() +
    geom_density(aes(x = value, colour = model)) +
    facet_wrap(~name, scales = "free")

  
  # visualise estimated (imputed / data augmented) parameters
  imputed_left <- drw_bda$Yleft %>%
    as_draws_df() %>%
    pivot_longer(cols = starts_with("x")) %>%
    group_by(name) %>%
    mutate(
      censoring = "left",
      interval_mean = round(mean(value), 2)
    )
#> Warning: Dropping 'draws_df' class as required metadata was removed.
  
  imputed_right <- drw_bda$Yright %>%
    as_draws_df() %>%
    pivot_longer(cols = starts_with("x")) %>%
    group_by(name) %>%
    mutate(
      censoring = "right",
      interval_mean = round(mean(value), 2)
    )
#> Warning: Dropping 'draws_df' class as required metadata was removed.
  
  imputed_interval <- drw_bda$Yint %>%
    as_draws_df() %>%
    pivot_longer(cols = starts_with("x")) %>%
    group_by(name) %>%
    mutate(
      censoring = "interval",
      interval_mean = round(mean(value), 2)
    )
#> Warning: Dropping 'draws_df' class as required metadata was removed.
  
  bind_rows(
    imputed_right, imputed_left, imputed_interval
  ) %>%
    ggplot() +
    geom_density(aes(x = value, colour = name)) +
    facet_wrap(interval_mean ~ censoring, scales = "free_x") +
    theme(legend.position = "none") +
    labs(title = "data augmentation / imputed values")

Created on 2024-05-19 with reprex v2.1.0

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.2 (2023-10-31 ucrt)
#>  os       Windows 10 x64 (build 19045)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  Dutch_Belgium.utf8
#>  ctype    Dutch_Belgium.utf8
#>  tz       Europe/Brussels
#>  date     2024-05-19
#>  pandoc   3.1.12.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package        * version    date (UTC) lib source
#>    abind            1.4-5      2016-07-21 [1] CRAN (R 4.3.0)
#>    backports        1.4.1      2021-12-13 [1] CRAN (R 4.3.0)
#>    base64enc        0.1-3      2015-07-28 [1] CRAN (R 4.3.0)
#>    bayesplot        1.11.1     2024-02-15 [1] CRAN (R 4.3.2)
#>    bridgesampling   1.1-2      2021-04-16 [1] CRAN (R 4.3.0)
#>    brms           * 2.20.4     2023-09-25 [1] CRAN (R 4.3.1)
#>    Brobdingnag      1.2-9      2022-10-19 [1] CRAN (R 4.3.0)
#>    checkmate        2.3.1      2023-12-04 [1] CRAN (R 4.3.2)
#>    cli              3.6.2      2023-12-11 [1] CRAN (R 4.3.2)
#>    cmdstanr         0.6.1      2023-09-12 [1] local
#>    coda             0.19-4.1   2024-01-31 [1] CRAN (R 4.3.2)
#>    codetools        0.2-19     2023-02-01 [2] CRAN (R 4.3.2)
#>    colorspace       2.1-0      2023-01-23 [1] CRAN (R 4.3.0)
#>    colourpicker     1.3.0      2023-08-21 [1] CRAN (R 4.3.1)
#>    crosstalk        1.2.1      2023-11-23 [1] CRAN (R 4.3.2)
#>    curl             5.2.0      2023-12-08 [1] CRAN (R 4.3.2)
#>    data.table       1.15.0     2024-01-30 [1] CRAN (R 4.3.2)
#>    digest           0.6.35     2024-03-11 [1] CRAN (R 4.3.3)
#>    distributional   0.4.0      2024-02-07 [1] CRAN (R 4.3.2)
#>    dplyr          * 1.1.4      2023-11-17 [1] CRAN (R 4.3.2)
#>    DT               0.32       2024-02-19 [1] CRAN (R 4.3.2)
#>    dygraphs         1.1.1.6    2018-07-11 [1] CRAN (R 4.3.0)
#>    ellipsis         0.3.2      2021-04-29 [1] CRAN (R 4.3.0)
#>    emmeans          1.10.0     2024-01-23 [1] CRAN (R 4.3.2)
#>    estimability     1.5        2024-02-20 [1] CRAN (R 4.3.2)
#>    evaluate         0.23       2023-11-01 [1] CRAN (R 4.3.2)
#>    fansi            1.0.6      2023-12-08 [1] CRAN (R 4.3.2)
#>    farver           2.1.1      2022-07-06 [1] CRAN (R 4.3.0)
#>    fastmap          1.1.1      2023-02-24 [1] CRAN (R 4.3.0)
#>    forcats        * 1.0.0      2023-01-29 [1] CRAN (R 4.3.0)
#>    fs               1.6.3      2023-07-20 [1] CRAN (R 4.3.1)
#>    generics         0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
#>    ggplot2        * 3.5.0      2024-02-23 [1] CRAN (R 4.3.2)
#>    glue             1.7.0      2024-01-09 [1] CRAN (R 4.3.2)
#>    gridExtra        2.3        2017-09-09 [1] CRAN (R 4.3.0)
#>    gtable           0.3.4      2023-08-21 [1] CRAN (R 4.3.1)
#>    gtools           3.9.5      2023-11-20 [1] CRAN (R 4.3.2)
#>    highr            0.10       2022-12-22 [1] CRAN (R 4.3.0)
#>    hms              1.1.3      2023-03-21 [1] CRAN (R 4.3.0)
#>    htmltools        0.5.7      2023-11-03 [1] CRAN (R 4.3.2)
#>    htmlwidgets      1.6.4      2023-12-06 [1] CRAN (R 4.3.2)
#>    httpuv           1.6.14     2024-01-26 [1] CRAN (R 4.3.2)
#>    igraph           2.0.2      2024-02-17 [1] CRAN (R 4.3.3)
#>    inline           0.3.19     2021-05-31 [1] CRAN (R 4.3.0)
#>    jsonlite         1.8.8      2023-12-04 [1] CRAN (R 4.3.2)
#>    knitr            1.45       2023-10-30 [1] CRAN (R 4.3.2)
#>    labeling         0.4.3      2023-08-29 [1] CRAN (R 4.3.1)
#>    later            1.3.2      2023-12-06 [1] CRAN (R 4.3.2)
#>    lattice          0.22-5     2023-10-24 [1] CRAN (R 4.3.2)
#>    lifecycle        1.0.4      2023-11-07 [1] CRAN (R 4.3.2)
#>    loo              2.7.0      2024-02-24 [1] CRAN (R 4.3.2)
#>    lubridate      * 1.9.3.9000 2024-01-29 [1] https://inbo.r-universe.dev (R 4.3.2)
#>    magrittr         2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
#>    markdown         1.12       2023-12-06 [1] CRAN (R 4.3.2)
#>    MASS             7.3-60.0.1 2024-01-13 [2] CRAN (R 4.3.2)
#>    Matrix           1.6-5      2024-01-11 [1] CRAN (R 4.3.2)
#>    matrixStats      1.2.0      2023-12-11 [1] CRAN (R 4.3.2)
#>    mime             0.12       2021-09-28 [1] CRAN (R 4.3.0)
#>    miniUI           0.1.1.1    2018-05-18 [1] CRAN (R 4.3.0)
#>    multcomp         1.4-25     2023-06-20 [1] CRAN (R 4.3.1)
#>    munsell          0.5.0      2018-06-12 [1] CRAN (R 4.3.0)
#>    mvtnorm          1.2-4      2023-11-27 [1] CRAN (R 4.3.2)
#>    nlme             3.1-164    2023-11-27 [2] CRAN (R 4.3.2)
#>    pillar           1.9.0      2023-03-22 [1] CRAN (R 4.3.0)
#>    pkgbuild         1.4.4      2024-03-17 [1] CRAN (R 4.3.3)
#>    pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
#>    plyr             1.8.9      2023-10-02 [1] CRAN (R 4.3.1)
#>    posterior        1.5.0      2023-10-31 [1] CRAN (R 4.3.2)
#>    processx         3.8.4      2024-03-16 [1] CRAN (R 4.3.3)
#>    promises         1.2.1      2023-08-10 [1] CRAN (R 4.3.1)
#>    ps               1.7.6      2024-01-18 [1] CRAN (R 4.3.2)
#>    purrr          * 1.0.2      2023-08-10 [1] CRAN (R 4.3.1)
#>    QuickJSR         1.1.3      2024-01-31 [1] CRAN (R 4.3.2)
#>    R.cache          0.16.0     2022-07-21 [1] CRAN (R 4.3.0)
#>    R.methodsS3      1.8.2      2022-06-13 [1] CRAN (R 4.3.0)
#>    R.oo             1.26.0     2024-01-24 [1] CRAN (R 4.3.2)
#>    R.utils          2.12.3     2023-11-18 [1] CRAN (R 4.3.2)
#>    R6               2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
#>    Rcpp           * 1.0.12     2024-01-09 [1] CRAN (R 4.3.2)
#>  D RcppParallel     5.1.7      2023-02-27 [1] CRAN (R 4.3.0)
#>    readr          * 2.1.5      2024-01-10 [1] CRAN (R 4.3.2)
#>    reprex           2.1.0      2024-01-11 [1] CRAN (R 4.3.2)
#>    reshape2         1.4.4      2020-04-09 [1] CRAN (R 4.3.0)
#>    rlang            1.1.3      2024-01-10 [1] CRAN (R 4.3.2)
#>    rmarkdown        2.25       2023-09-18 [1] CRAN (R 4.3.1)
#>    rstan            2.32.6     2024-03-05 [1] CRAN (R 4.3.3)
#>    rstantools       2.4.0      2024-01-31 [1] CRAN (R 4.3.2)
#>    rstudioapi       0.15.0     2023-07-07 [1] CRAN (R 4.3.1)
#>    sandwich         3.1-0      2023-12-11 [1] CRAN (R 4.3.2)
#>    scales           1.3.0      2023-11-28 [1] CRAN (R 4.3.2)
#>    sessioninfo      1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
#>    shiny            1.8.0      2023-11-17 [1] CRAN (R 4.3.2)
#>    shinyjs          2.1.0      2021-12-23 [1] CRAN (R 4.3.0)
#>    shinystan        2.6.0      2022-03-03 [1] CRAN (R 4.3.0)
#>    shinythemes      1.2.0      2021-01-25 [1] CRAN (R 4.3.0)
#>    StanHeaders      2.32.6     2024-03-01 [1] CRAN (R 4.3.3)
#>    stringi          1.8.3      2023-12-11 [1] CRAN (R 4.3.2)
#>    stringr        * 1.5.1      2023-11-14 [1] CRAN (R 4.3.2)
#>    styler           1.10.2     2023-08-29 [1] CRAN (R 4.3.1)
#>    survival         3.5-8      2024-02-14 [2] CRAN (R 4.3.2)
#>    tensorA          0.36.2.1   2023-12-13 [1] CRAN (R 4.3.2)
#>    TH.data          1.1-2      2023-04-17 [1] CRAN (R 4.3.1)
#>    threejs          0.3.3      2020-01-21 [1] CRAN (R 4.3.0)
#>    tibble         * 3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
#>    tidyr          * 1.3.1      2024-01-24 [1] CRAN (R 4.3.2)
#>    tidyselect       1.2.1      2024-03-11 [1] CRAN (R 4.3.3)
#>    tidyverse      * 2.0.0      2023-02-22 [1] CRAN (R 4.3.0)
#>    timechange       0.3.0      2024-01-18 [1] CRAN (R 4.3.2)
#>    tzdb             0.4.0      2023-05-12 [1] CRAN (R 4.3.1)
#>    utf8             1.2.4      2023-10-22 [1] CRAN (R 4.3.2)
#>    V8               4.4.2      2024-02-15 [1] CRAN (R 4.3.2)
#>    vctrs            0.6.5      2023-12-01 [1] CRAN (R 4.3.2)
#>    withr            3.0.0      2024-01-16 [1] CRAN (R 4.3.2)
#>    xfun             0.41       2023-11-01 [1] CRAN (R 4.3.2)
#>    xml2             1.3.6      2023-12-04 [1] CRAN (R 4.3.2)
#>    xtable           1.8-4      2019-04-21 [1] CRAN (R 4.3.0)
#>    xts              0.13.2     2024-01-21 [1] CRAN (R 4.3.2)
#>    yaml             2.3.8      2023-12-11 [1] CRAN (R 4.3.2)
#>    zoo              1.8-12     2023-04-13 [1] CRAN (R 4.3.0)
#> 
#>  [1] C:/R/library
#>  [2] C:/R/R-4.3.2/library
#> 
#>  D ── DLL MD5 mismatch, broken installation.
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@paul-buerkner
Copy link
Owner

Thank you for opening this issue. Once I have more time, I will take a detailed look.

@wds15
Copy link
Contributor

wds15 commented Aug 12, 2024

Just to note: One real performance issue with the current brms implementation for censoring is to loop line by line through the data set and calling the LPDF/CDF or CCDF function every time. This avoids using the vectorised versions of these functions, which is really slow. One can easily use the vectored versions if one creates first an index vector which contains all indices of data rows which are not censored, right/left censored and then uses the vectorised function calls. This should solve to some extent the performance issue seen with large data sets.

@paul-buerkner paul-buerkner added this to the brms 2.22.0 milestone Aug 13, 2024
@paul-buerkner
Copy link
Owner

Makes sense. How would we combine this with threading though, where we need to obtain the write subsets of these index vectors. We would need some kind of Stan version of Rs "intersect". Do we have something like this?

@wds15
Copy link
Contributor

wds15 commented Sep 14, 2024

I do not think an intersect command exists, but a first version which just loops over the data set to find out which indices are from which observation type in range of the current slice will do. Not really efficient, but since this is a loop over integers it's basically for free. That should still be faster as we gain vectorized statements.

..or you make one reduce sum call per censoring type...then this can be done fast.

@wds15
Copy link
Contributor

wds15 commented Sep 15, 2024

Actually it's simpler: the current approach loops over all rows and adds line by line to the log lik. The new approach should loop again line by line over the data and collect all the indices for each censoring type. Then one adds at the end of that the entire log lik for all the observations in vectorized form. When doing reduce sum then the loop is not over the entire data, but only over the sub slice.

If it helps I can share some dummy code demonstrating the above.

@paul-buerkner
Copy link
Owner

Thanks! Some dummy code would be helpful I think!

@wds15
Copy link
Contributor

wds15 commented Sep 16, 2024

Something along these lines:

/ code pattern now (with reduce_sum):
for (n in 1:N) {
  int nn = n + start - 1;
  // special treatment of censored data
  if (cens[nn] == 0) {
    ptarget += cox_log_lpdf(Y[nn] | mu[n], bhaz[n], cbhaz[n]);
  } else if (cens[nn] == 1) {
    ptarget += cox_log_lccdf(Y[nn] | mu[n], bhaz[n], cbhaz[n]);
  } else if (cens[nn] == -1) {
    ptarget += cox_log_lcdf(Y[nn] | mu[n], bhaz[n], cbhaz[n]);
  }
}

// better code pattern:
{
  int num_event = 0;
  int num_censor_right = 0;
  int num_censor_left = 0;
  array[N] int idx_event;
  array[N] int idx_censor_right;
  array[N] int idx_censor_left;
  for (n in 1:N) {
    int nn = n + start - 1;
    // special treatment of censored data
    if (cens[nn] == 0) {
      num_event += 1;
      idx_event[num_event] = nn;
    } else if (cens[nn] == 1) {
      num_censor_right += 1;
      idx_censor_right[num_censor_right] = nn;
    } else if (cens[nn] == -1) {
      num_censor_left += 1;
      idx_censor_left[num_censor_left] = nn;
    }
  }
  ptarget += cox_log_lpdf(Y[idx_event[1:num_event] - start + 1] | mu[idx_event[1:num_event]], bhaz[idx_event[1:num_event]], cbhaz[idx_event[1:num_event]]);
  ptarget += cox_log_lccdf(Y[idx_censor_right[1:num_censor_right] - start + 1] | mu[idx_censor_right[1:num_censor_right]], bhaz[idx_censor_right[1:num_censor_right]], cbhaz[idx_censor_right[1:num_censor_right]]);
  ptarget += cox_log_lcdf(Y[idx_censor_left[1:num_censor_left] - start + 1] | mu[idx_censor_left[1:num_censor_left]], bhaz[idx_censor_left[1:num_censor_left]], cbhaz[idx_censor_left[1:num_censor_left]]);
}

@paul-buerkner
Copy link
Owner

makes sense! Thank you!

@paul-buerkner
Copy link
Owner

Shouldn't the pattern rather be:(?)

{
  int num_event = 0;
  int num_censor_right = 0;
  int num_censor_left = 0;
  array[N] int idx_event;
  array[N] int idx_censor_right;
  array[N] int idx_censor_left;
  for (n in 1:N) {
    int nn = n + start - 1;
    // special treatment of censored data
    if (cens[nn] == 0) {
      num_event += 1;
      idx_event[num_event] = n;
    } else if (cens[nn] == 1) {
      num_censor_right += 1;
      idx_censor_right[num_censor_right] = n;
    } else if (cens[nn] == -1) {
      num_censor_left += 1;
      idx_censor_left[num_censor_left] = n;
    }
  }
  ptarget += cox_log_lpdf(Y[idx_event[1:num_event] + start - 1] | mu[idx_event[1:num_event]], bhaz[idx_event[1:num_event]], cbhaz[idx_event[1:num_event]]);
  ptarget += cox_log_lccdf(Y[idx_censor_right[1:num_censor_right] + start - 1] | mu[idx_censor_right[1:num_censor_right]], bhaz[idx_censor_right[1:num_censor_right]], cbhaz[idx_censor_right[1:num_censor_right]]);
  ptarget += cox_log_lcdf(Y[idx_censor_left[1:num_censor_left] + start - 1] | mu[idx_censor_left[1:num_censor_left]], bhaz[idx_censor_left[1:num_censor_left]], cbhaz[idx_censor_left[1:num_censor_left]]);
}

This way, idx_event[1:num_event] refers to the local (sliced) indices and idx_event[1:num_event] + start - 1 refers to the global (non-sliced) indices.

@wds15
Copy link
Contributor

wds15 commented Sep 18, 2024

I guess you are right here on the indexing stuff. I did not test my code, but the key idea is to collect the indices during the loop and then doing the appropriate vectored statements when calling the lpdfs.

@paul-buerkner
Copy link
Owner

Okay. The feature is almost complete. However, when I parse it with threading, I get the error Ill-typed arguments supplied to infix operator +. ... Instead supplied arguments of incompatible type: array[] int, int. caused by Jlcens[1:Nlcens] + start - 1. What is the least annoying way to add an integer to an arry of integers?

@wds15
Copy link
Contributor

wds15 commented Sep 18, 2024

no idea... brute force would be to create at the same time two index vectors per case... one for the shifted and one for the not shifted stuff.. not nice, but should work for sure...

@wds15
Copy link
Contributor

wds15 commented Sep 18, 2024

or you create a function which does the offset to the first array in a loop element wise. That's another option.

@paul-buerkner
Copy link
Owner

paul-buerkner commented Sep 18, 2024 via email

paul-buerkner added a commit that referenced this issue Sep 19, 2024
@paul-buerkner
Copy link
Owner

This feature is now implemented. I am pretty sure it is correct, but I would appreciate if you would double check both the non-threaded and the threaded Stan code, for example by running:

make_stancode(time | cens(censored) ~ age * sex + disease + (1|patient),
            data = kidney, family = lognormal())

make_stancode(time | cens(censored) ~ age * sex + disease + (1|patient),
            data = kidney, family = lognormal(), threads = threading(2))

@wds15
Copy link
Contributor

wds15 commented Sep 19, 2024

will do

@wds15
Copy link
Contributor

wds15 commented Sep 19, 2024

The Stan code looks good to me in both cases. I also ran as a sanity check this one:

fit1 <- brm(time | cens(censored) ~ age * sex + disease + (1|patient),
            data = kidney, family = lognormal(), seed=5346747)

fit2 <- brm(time | cens(censored) ~ age * sex + disease + (1|patient),
            data = kidney, family = lognormal(), threads = threading(2, grainsize=10 * nrow(kidney), static=TRUE), seed=5346747)

Doing so makes the second fit run the very same likelihood, since the parallelisation is turned off via the grain size. I am getting matching numerical results.

I have not checked interval censoring, which is not part of this example.... but I guess it's just fine.

Very cool to have this coded now like this. This will be super useful for truncated data as well!

@paul-buerkner
Copy link
Owner

paul-buerkner commented Sep 19, 2024

Thank you for checking!

Truncation does not yet support auto-vectorized code I think. Perhaps you can check and open a new issue with an example if you think there is something to be optimized as well.

@wds15
Copy link
Contributor

wds15 commented Sep 19, 2024

Uhm... this is not enabled for Cox?

I also confirmed that we gain speed here

library(brms)
set.seed(12345)
covs <- data.frame(id  = 1:10000, trt = stats::rbinom(10000, 1L, 0.5))
d1 <- simsurv::simsurv(lambdas = 0.1, gammas  = 1.5, betas = c(trt = -0.5),
                       x = covs, maxt  = 5)
d1 <- merge(d1, covs)
d1$g <- sample(c("a", "b"), nrow(d1), TRUE)


.cache_dir <- here::here("_brms-cache", Sys.info()["machine"])
options(brms.backend="cmdstanr", cmdstanr_write_stan_file_dir=.cache_dir)

# create cache directory if not yet available
dir.create(.cache_dir, FALSE, TRUE)

fit1 <- brm(eventtime | cens(1 - status) ~ 1 + trt,
            data = d1, family = lognormal(), refresh = 250, chains=1, seed=3546546)

## 6s

.cache_dir <- here::here("_brms-cache-dev", Sys.info()["machine"])
options(brms.backend="cmdstanr", cmdstanr_write_stan_file_dir=.cache_dir)

# create cache directory if not yet available
dir.create(.cache_dir, FALSE, TRUE)

pkgload::load_all("/home/rdocker/rwork/brms")

fit1_new <- brm(eventtime | cens(1 - status) ~ 1 + trt,
            data = d1, family = lognormal(), refresh = 250, chains=1, seed=3546546)

## 3.6s

I will check truncation... pretty sure the same pattern will be fruitful there as well...next issue coming once ready..

@paul-buerkner
Copy link
Owner

paul-buerkner commented Sep 19, 2024

Not enabled for cox since we would need vectorized lcdf etc functions for it. Does Stan already support function overloading (same function name with different signatures) for functions defined by the user?

@wds15
Copy link
Contributor

wds15 commented Sep 19, 2024

@paul-buerkner
Copy link
Owner

paul-buerkner commented Sep 20, 2024

Cox models now also have a vectorized version and the speed up seems to be clearly noticeable from my checks.

I am pretty sure the math is correct, but would you mind double checking in https://github.com/paul-buerkner/brms/blob/master/inst/chunks/fun_cox.stan ?

@hansvancalster
Copy link
Author

Thanks for working on this, however, the current approach may have issues. I installed brms 2.22 (current status on main branch) and reran the original reprex (first post in this thread).

I observed the following:

  • the vectorized approach is not recovering the same parameter estimates as I used to obtain with the unvectorized approach from brms 2.21
  • the vectorized approach is only slightly faster in the example
  • for completeness I have also retained the (not vectorized) approach in the reprex using data augmentation which recovers the same parameters as brms 2.21 and is much faster

I attach the brms generated stan files that are needed to run this reprex locally (that was the easiest approach I came up with to deal with multiple brms versions - the reprex uses cmdstanr to fit the models).

censoring_brms.zip

library(tidyverse)
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.22.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar

## generate data from beta distribution
set.seed(2154)
mu <- 0.5
phi <- 2
true_data <- tibble(
  y = rbeta(100, shape1 = mu * phi, shape2 = (1 - mu) * phi)
)

## Censoring
censored_data <- true_data %>%
  mutate(
    y_left = floor(y * 10) / 10,
    y_right = ceiling(y * 10) / 10,
    y_left = ifelse(y_left == 0, 1e-7, y_left),
    y_right = ifelse(y_right == 1, 1 - 1e-7, y_right),
    censoring = case_when(
      y_right == 0.1 ~ "left",
      y_left == 0.9 ~ "right",
      TRUE ~ "interval"
    ),
    y_cens = case_when(
      censoring == "left" ~ y_right,
      censoring == "right" ~ y_left,
      TRUE ~ y_left
    )
  )

beta_stan_data <- brms::make_standata(
  y_cens | cens(censoring, y_right) ~ 1,
  family = brms::Beta(),
  data = censored_data)

# compile models
integrate_out_not_vectorized <- cmdstanr::cmdstan_model(
  "beta_censored_brms_not_vectorized.stan" # this is brms 2.21
)
integrate_out_vectorized <- cmdstanr::cmdstan_model(
  "beta_censored_brms_vectorized.stan" # this is brms 2.22
)
augmented_not_vectorized <- cmdstanr::cmdstan_model(
  "bda.stan" # this is data augmentation model, not vectorized
)

# fit models
integrated_out_not_vectorized <- integrate_out_not_vectorized$sample(
  data = beta_stan_data,
  seed = 123,
  chains = 4,
  parallel_chains = 1
)
#> Running MCMC with 4 sequential chains...
#> 
#> Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: gamma_lpdf: Random variable is inf, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpcF6k12/model-4cecd294f9.stan', line 20, column 2 to column 41)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 1 finished in 35.3 seconds.
#> Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: beta_lccdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpcF6k12/model-4cecd294f9.stan', line 34, column 8 to column 68)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: beta_lccdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpcF6k12/model-4cecd294f9.stan', line 34, column 8 to column 68)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 2 finished in 23.3 seconds.
#> Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: gamma_lpdf: Random variable is inf, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpcF6k12/model-4cecd294f9.stan', line 20, column 2 to column 41)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 3 finished in 22.8 seconds.
#> Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 4 Exception: beta_lccdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpcF6k12/model-4cecd294f9.stan', line 34, column 8 to column 68)
#> Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 4
#> Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 4 Exception: beta_lccdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpcF6k12/model-4cecd294f9.stan', line 34, column 8 to column 68)
#> Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 4
#> Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 4 finished in 20.9 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 25.6 seconds.
#> Total execution time: 103.1 seconds.
integrated_out_vectorized <- integrate_out_vectorized$sample(
  data = beta_stan_data,
  seed = 123,
  chains = 4,
  parallel_chains = 1
)
#> Running MCMC with 4 sequential chains...
#> 
#> Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: gamma_lpdf: Random variable is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp80ePEl/model-1cdc46c86c94.stan', line 47, column 2 to column 41)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup)
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: beta_lccdf: Second shape parameter[1] is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp80ePEl/model-1cdc46c86c94.stan', line 58, column 4 to column 109)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 1 finished in 23.6 seconds.
#> Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: beta_lccdf: First shape parameter[1] is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp80ePEl/model-1cdc46c86c94.stan', line 58, column 4 to column 109)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 2 finished in 20.2 seconds.
#> Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: beta_lccdf: Second shape parameter[1] is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp80ePEl/model-1cdc46c86c94.stan', line 58, column 4 to column 109)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 3 finished in 22.4 seconds.
#> Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 4 Exception: beta_lccdf: Second shape parameter[1] is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/Rtmp80ePEl/model-1cdc46c86c94.stan', line 58, column 4 to column 109)
#> Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 4
#> Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 4 finished in 22.8 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 22.2 seconds.
#> Total execution time: 89.5 seconds.
augmented_not_vectorized <- augmented_not_vectorized$sample(
  data = beta_stan_data,
  seed = 123,
  chains = 4,
  parallel_chains = 1
)
#> Running MCMC with 4 sequential chains...
#> 
#> Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 113, column 6 to column 78)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 1 finished in 1.2 seconds.
#> Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 2 Exception: beta_lpdf: First shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 113, column 6 to column 78)
#> Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 2
#> Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 2 finished in 1.1 seconds.
#> Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
#> Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 3 finished in 1.3 seconds.
#> Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 4 Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'C:/Users/HANS_V~1/AppData/Local/Temp/RtmpKmzDnp/model-3ff8c2d74e9.stan', line 113, column 6 to column 78)
#> Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 4
#> Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
#> Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
#> Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
#> Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
#> Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
#> Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
#> Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
#> Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
#> Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
#> Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
#> Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
#> Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
#> Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
#> Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
#> Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
#> Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
#> Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
#> Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
#> Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
#> Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
#> Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 4 finished in 1.2 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 1.2 seconds.
#> Total execution time: 5.3 seconds.

# compare model timings
integrated_out_not_vectorized$time()
#> $total
#> [1] 103.0832
#> 
#> $chains
#>   chain_id warmup sampling  total
#> 1        1 14.599   20.725 35.324
#> 2        2 10.685   12.591 23.276
#> 3        3 11.129   11.637 22.766
#> 4        4 10.009   10.877 20.886
integrated_out_vectorized$time()
#> $total
#> [1] 89.54901
#> 
#> $chains
#>   chain_id warmup sampling  total
#> 1        1 11.491   12.068 23.559
#> 2        2 10.855    9.370 20.225
#> 3        3 11.770   10.626 22.396
#> 4        4 11.594   11.174 22.768
augmented_not_vectorized$time()
#> $total
#> [1] 5.272309
#> 
#> $chains
#>   chain_id warmup sampling total
#> 1        1  0.626    0.529 1.155
#> 2        2  0.601    0.544 1.145
#> 3        3  0.659    0.613 1.272
#> 4        4  0.628    0.539 1.167

# compare model fits
integrated_out_not_vectorized$summary(variables = c("b_Intercept", "phi"))
#> # A tibble: 2 × 10
#>   variable     mean median    sd   mad     q5   q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 b_Intercept 0.187  0.185 0.106 0.105 0.0160 0.364  1.00    3398.    2949.
#> 2 phi         2.43   2.41  0.338 0.334 1.90   3.02   1.00    3527.    2585.
integrated_out_vectorized$summary(variables = c("b_Intercept", "phi"))
#> # A tibble: 2 × 10
#>   variable    mean  median     sd    mad       q5    q95  rhat ess_bulk ess_tail
#>   <chr>      <dbl>   <dbl>  <dbl>  <dbl>    <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 b_Inter… -1.83   -1.82   0.291  0.282  -2.32    -1.36   1.01    2370.    1974.
#> 2 phi       0.0387  0.0260 0.0396 0.0271  0.00180  0.118  1.00    1602.     984.
augmented_not_vectorized$summary(variables = c("b_Intercept", "phi"))
#> # A tibble: 2 × 10
#>   variable     mean median    sd   mad      q5   q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 b_Intercept 0.187  0.188 0.110 0.110 0.00514 0.365  1.00    9034.    3182.
#> 2 phi         2.42   2.40  0.329 0.321 1.91    3.00   1.00    6914.    2742.

drw_ionv <- posterior::as_draws_rvars(integrated_out_not_vectorized)
drw_iov <- posterior::as_draws_rvars(integrated_out_vectorized)
drw_anv <- posterior::as_draws_rvars(augmented_not_vectorized)
ionv_mu_phi <- drw_ionv %>%
  as_draws_df() %>%
  select(Intercept, phi) %>%
  mutate(
    mu = plogis(Intercept),
    model = "integrate out (not vectorized)"
  )
#> Warning: Dropping 'draws_df' class as required metadata was removed.
iov_mu_phi <- drw_iov %>%
  as_draws_df() %>%
  select(Intercept, phi) %>%
  mutate(
    mu = plogis(Intercept),
    model = "integrate out (vectorized)"
  )
#> Warning: Dropping 'draws_df' class as required metadata was removed.
anv_mu_phi <- drw_anv %>%
  as_draws_df() %>%
  select(Intercept, phi) %>%
  mutate(
    mu = plogis(Intercept),
    model = "data augmentation"
  )
#> Warning: Dropping 'draws_df' class as required metadata was removed.
bind_rows(ionv_mu_phi, iov_mu_phi, anv_mu_phi) %>%
  pivot_longer(cols = c("phi", "mu")) %>%
  ggplot() +
  geom_density(aes(x = value, colour = model)) +
  geom_vline(
    data = data.frame(
      name = c("mu", "phi"),
      value = c(mu, phi)
    ),
    aes(xintercept = value)) +
  facet_wrap(~name, scales = "free")

Created on 2024-09-22 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14 ucrt)
#>  os       Windows 10 x64 (build 19045)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  Dutch_Belgium.utf8
#>  ctype    Dutch_Belgium.utf8
#>  tz       Europe/Brussels
#>  date     2024-09-22
#>  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package        * version     date (UTC) lib source
#>    abind            1.4-8       2024-09-12 [1] CRAN (R 4.4.1)
#>    backports        1.5.0       2024-05-23 [1] CRAN (R 4.4.0)
#>    bayesplot        1.11.1.9000 2024-09-01 [1] https://stan-dev.r-universe.dev (R 4.4.1)
#>    bridgesampling   1.1-2       2021-04-16 [1] CRAN (R 4.4.0)
#>    brms           * 2.22.0      2024-09-22 [1] Github (paul-buerkner/brms@9053e9f)
#>    Brobdingnag      1.2-9       2022-10-19 [1] CRAN (R 4.4.0)
#>    checkmate        2.3.2       2024-07-29 [1] CRAN (R 4.4.1)
#>    cli              3.6.3       2024-06-21 [1] CRAN (R 4.4.1)
#>    cmdstanr         0.8.1       2024-08-05 [1] https://stan-dev.r-universe.dev (R 4.4.0)
#>    coda             0.19-4.1    2024-01-31 [1] CRAN (R 4.4.0)
#>    codetools        0.2-20      2024-03-31 [2] CRAN (R 4.4.1)
#>    colorspace       2.1-1       2024-07-26 [1] CRAN (R 4.4.1)
#>    curl             5.2.1       2024-03-01 [1] CRAN (R 4.4.0)
#>    data.table       1.15.4      2024-03-30 [1] CRAN (R 4.4.0)
#>    digest           0.6.37      2024-08-19 [1] CRAN (R 4.4.1)
#>    distributional   0.5.0       2024-09-17 [1] CRAN (R 4.4.1)
#>    dplyr          * 1.1.4       2023-11-17 [1] CRAN (R 4.4.0)
#>    emmeans          1.10.4      2024-08-21 [1] CRAN (R 4.4.1)
#>    estimability     1.5.1       2024-05-12 [1] CRAN (R 4.4.0)
#>    evaluate         0.24.0      2024-06-10 [1] CRAN (R 4.4.1)
#>    fansi            1.0.6       2023-12-08 [1] CRAN (R 4.4.0)
#>    farver           2.1.2       2024-05-13 [1] CRAN (R 4.4.0)
#>    fastmap          1.2.0       2024-05-15 [1] CRAN (R 4.4.0)
#>    forcats        * 1.0.0       2023-01-29 [1] CRAN (R 4.4.0)
#>    fs               1.6.4       2024-04-25 [1] CRAN (R 4.4.0)
#>    generics         0.1.3       2022-07-05 [1] CRAN (R 4.4.0)
#>    ggplot2        * 3.5.1       2024-04-23 [1] CRAN (R 4.4.0)
#>    glue             1.7.0       2024-01-09 [1] CRAN (R 4.4.0)
#>    gtable           0.3.5       2024-04-22 [1] CRAN (R 4.4.0)
#>    highr            0.11        2024-05-26 [1] CRAN (R 4.4.0)
#>    hms              1.1.3       2023-03-21 [1] CRAN (R 4.4.0)
#>    htmltools        0.5.8.1     2024-04-04 [1] CRAN (R 4.4.0)
#>    jsonlite         1.8.8       2023-12-04 [1] CRAN (R 4.4.0)
#>    knitr            1.48        2024-07-07 [1] CRAN (R 4.4.1)
#>    labeling         0.4.3       2023-08-29 [1] CRAN (R 4.4.0)
#>    lattice          0.22-6      2024-03-20 [1] CRAN (R 4.4.0)
#>    lifecycle        1.0.4       2023-11-07 [1] CRAN (R 4.4.0)
#>    loo              2.8.0.9000  2024-08-05 [1] https://stan-dev.r-universe.dev (R 4.4.0)
#>    lubridate      * 1.9.3.9000  2024-05-28 [1] https://inbo.r-universe.dev (R 4.4.0)
#>    magrittr         2.0.3       2022-03-30 [1] CRAN (R 4.4.0)
#>    MASS             7.3-61      2024-06-13 [2] CRAN (R 4.4.1)
#>    Matrix           1.7-0       2024-03-22 [1] CRAN (R 4.4.0)
#>    matrixStats      1.4.1       2024-09-08 [1] CRAN (R 4.4.1)
#>    multcomp         1.4-26      2024-07-18 [1] CRAN (R 4.4.1)
#>    munsell          0.5.1       2024-04-01 [1] CRAN (R 4.4.0)
#>    mvtnorm          1.3-1       2024-09-03 [1] CRAN (R 4.4.1)
#>    nlme             3.1-166     2024-08-14 [2] CRAN (R 4.4.1)
#>    pillar           1.9.0       2023-03-22 [1] CRAN (R 4.4.0)
#>    pkgconfig        2.0.3       2019-09-22 [1] CRAN (R 4.4.0)
#>    posterior        1.6.0       2024-07-03 [1] CRAN (R 4.4.1)
#>    processx         3.8.4       2024-03-16 [1] CRAN (R 4.4.0)
#>    ps               1.8.0       2024-09-12 [1] CRAN (R 4.4.1)
#>    purrr          * 1.0.2       2023-08-10 [1] CRAN (R 4.4.0)
#>    R6               2.5.1       2021-08-19 [1] CRAN (R 4.4.0)
#>    Rcpp           * 1.0.13      2024-07-17 [1] CRAN (R 4.4.1)
#>  D RcppParallel     5.1.9       2024-08-19 [1] CRAN (R 4.4.1)
#>    readr          * 2.1.5       2024-01-10 [1] CRAN (R 4.4.0)
#>    reprex           2.1.1       2024-07-06 [1] CRAN (R 4.4.1)
#>    rlang            1.1.4       2024-06-04 [1] CRAN (R 4.4.1)
#>    rmarkdown        2.28        2024-08-17 [1] CRAN (R 4.4.1)
#>    rstantools       2.4.0.9000  2024-08-18 [1] https://stan-dev.r-universe.dev (R 4.4.0)
#>    rstudioapi       0.16.0      2024-03-24 [1] CRAN (R 4.4.0)
#>    sandwich         3.1-0       2023-12-11 [1] CRAN (R 4.4.0)
#>    scales           1.3.0       2023-11-28 [1] CRAN (R 4.4.0)
#>    sessioninfo      1.2.2       2021-12-06 [1] CRAN (R 4.4.0)
#>    stringi          1.8.4       2024-05-06 [1] CRAN (R 4.4.0)
#>    stringr        * 1.5.1       2023-11-14 [1] CRAN (R 4.4.0)
#>    survival         3.7-0       2024-06-05 [2] CRAN (R 4.4.1)
#>    tensorA          0.36.2.1    2023-12-13 [1] CRAN (R 4.4.0)
#>    TH.data          1.1-2       2023-04-17 [1] CRAN (R 4.4.0)
#>    tibble         * 3.2.1       2023-03-20 [1] CRAN (R 4.4.0)
#>    tidyr          * 1.3.1       2024-01-24 [1] CRAN (R 4.4.0)
#>    tidyselect       1.2.1       2024-03-11 [1] CRAN (R 4.4.0)
#>    tidyverse      * 2.0.0       2023-02-22 [1] CRAN (R 4.4.0)
#>    timechange       0.3.0       2024-01-18 [1] CRAN (R 4.4.0)
#>    tzdb             0.4.0       2023-05-12 [1] CRAN (R 4.4.0)
#>    utf8             1.2.4       2023-10-22 [1] CRAN (R 4.4.0)
#>    vctrs            0.6.5       2023-12-01 [1] CRAN (R 4.4.0)
#>    withr            3.0.1       2024-07-31 [1] CRAN (R 4.4.1)
#>    xfun             0.47        2024-08-17 [1] CRAN (R 4.4.1)
#>    xml2             1.3.6       2023-12-04 [1] CRAN (R 4.4.0)
#>    xtable           1.8-4       2019-04-21 [1] CRAN (R 4.4.0)
#>    yaml             2.3.10      2024-07-26 [1] CRAN (R 4.4.1)
#>    zoo              1.8-12      2023-04-13 [1] CRAN (R 4.4.0)
#> 
#>  [1] C:/R/library
#>  [2] C:/R/R-4.4.1/library
#> 
#>  D ── DLL MD5 mismatch, broken installation.
#> 
#> ──────────────────────────────────────────────────────────────────────────────

@paul-buerkner
Copy link
Owner

Thank you for your detailed reprex! I will look into it and fix it!

@paul-buerkner paul-buerkner reopened this Sep 23, 2024
@paul-buerkner
Copy link
Owner

paul-buerkner commented Sep 23, 2024

Got it. The issue was in the interval censored case. There, I treated the PDFs as returning a vector when being passed vectors as inputs, which of course the don't (they just return a scalar). I think the best approach is to deactivate the vectorization for interval censored data.

As for the data augmention, this indeed seems to help a whole lot in terms of speed but may be more cumbersome to implement. I will leave this issue open in case I or someone else wants to tackle it in the future.

Thank you again for your help!

@hansvancalster
Copy link
Author

For future reference, here is a vectorized version of the beta-distribution data augmentation model bda_vectorized.zip. Vectorization is trivial in this case, also for interval censoring. To try it, use the code in #1657 (comment)

@wds15
Copy link
Contributor

wds15 commented Sep 23, 2024

I will certainly try the data augmentation approach. In fact, I‘d expect that it should be simple to implement in brms given that brms supports missing data already. As I understand this approach, it is basically missing data with a lower bound. Talking to colleague he said that this approach is also used by JAGS.

@wds15
Copy link
Contributor

wds15 commented Sep 23, 2024

@paul-buerkner reading through the cox code I don‘t find any glitch. The only thing I noticed is that

Log1m_exp(- xy)

Can obviously be written as

Log1p_exp(xy)

But I would really leave in as a comment the original log1m_exp(-xy) line in a commented way, since it is more direct in turning the ccdf into a cdf…will also test the new approach on an example data set next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants