a more user-friendly api for fitting models #64

jburos · 2017-03-09T18:03:01Z

Right now, the process of fitting a model requires that the user know quite a bit about how survivalstan is built & the details of the various models supported therein.

For example, there are several sets of features which can be combined more or less independently across various models.

These features are:

choice of baseline hazard (aka survival function):
- parametric: weibull, exp, gamma, etc (n.b. data typically in "wide" form)
- semi-parametric: randomwalk, gamma prior, etc (n.b. data typically in "long" form)
estimate varying-coefficient (yes or no)
- right now, takes a single column name to group by
estimate time-varying effects (yes or no)
- if yes, then all coefficients are treated as time-varying

In order to use these, the user has to know (1) which model (among those in survivalstan.models) implements the features they desire, assuming such a model exists. Secondly, the user has to know (2) what data format & which inputs the selected model requires. Both of these are unreasonable expectations of the user (per discussion with @julia326).

Ideally, we should enable the user to provide:

their data frame
a patsy formula
- currently indicates whether to group by a variable, signals YES to varying-coefficient model
- will likely be modified to use a different syntax per issue Support specifying which variables to vary by group in fit_stan_survival_model #47
the desired baseline_hazard (with some reasonable default)
(for now) whether to estimate time-varying effects
- could also eventually be implemented using patsy timevary(age) + ... syntax
- (the above also not yet supported in model code)

In theory, the fit_model function should then prep the data, select the appropriate Stan file, and fit the model. This would be a much cleaner process for fitting a model.

However, some details in the implementation need to be worked out:

Some features (e.g. time-varying effect estimation) are much cleaner to implement in the "long" data form than the wide.
- we could (for example) rewrite all models to use the "long" format (see data-format issue, below)
- Or, we could throw a FeatureNotImplemented error if the user gives us an invalid combination of inputs
- ultimately we will likely want to support varying-coef & time-varying effects (as well as other features) in all models
Second problem is to figure out how to know whether the user has provided data in "long" or "wide" format. If they provide "wide" while the model requires long, we will want to convert to long using prep_data_long_surv. If they provide long & the model requires wide, we throw an error (unless all models are coded to take long data).
- It is useful to allow the user to provide "long" data since they may have time-varying covariate values. The auto-convert utility doesn't accommodate these.
- However, most of the time the user will likely provide wide data.
- One option would be to have a longSurv(..) patsy function which, if given, would signal that the data are in long-format. Otherwise, we assume they are wide.

The text was updated successfully, but these errors were encountered:

jburos added the enhancement label Mar 9, 2017

jburos added this to the v0.1.3 release milestone Mar 9, 2017

jburos mentioned this issue Oct 31, 2017

Time and event column types #65

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a more user-friendly api for fitting models #64

a more user-friendly api for fitting models #64

jburos commented Mar 9, 2017 •

edited

Loading

a more user-friendly api for fitting models #64

a more user-friendly api for fitting models #64

Comments

jburos commented Mar 9, 2017 • edited Loading

jburos commented Mar 9, 2017 •

edited

Loading