Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a more user-friendly api for fitting models #64

Open
jburos opened this issue Mar 9, 2017 · 0 comments
Open

a more user-friendly api for fitting models #64

jburos opened this issue Mar 9, 2017 · 0 comments

Comments

@jburos
Copy link
Member

jburos commented Mar 9, 2017

Right now, the process of fitting a model requires that the user know quite a bit about how survivalstan is built & the details of the various models supported therein.

For example, there are several sets of features which can be combined more or less independently across various models.

These features are:

  1. choice of baseline hazard (aka survival function):
    • parametric: weibull, exp, gamma, etc (n.b. data typically in "wide" form)
    • semi-parametric: randomwalk, gamma prior, etc (n.b. data typically in "long" form)
  2. estimate varying-coefficient (yes or no)
    • right now, takes a single column name to group by
  3. estimate time-varying effects (yes or no)
    • if yes, then all coefficients are treated as time-varying

In order to use these, the user has to know (1) which model (among those in survivalstan.models) implements the features they desire, assuming such a model exists. Secondly, the user has to know (2) what data format & which inputs the selected model requires. Both of these are unreasonable expectations of the user (per discussion with @julia326).

Ideally, we should enable the user to provide:

  • their data frame
  • a patsy formula
  • the desired baseline_hazard (with some reasonable default)
  • (for now) whether to estimate time-varying effects
    • could also eventually be implemented using patsy timevary(age) + ... syntax
    • (the above also not yet supported in model code)

In theory, the fit_model function should then prep the data, select the appropriate Stan file, and fit the model. This would be a much cleaner process for fitting a model.

However, some details in the implementation need to be worked out:

  1. Some features (e.g. time-varying effect estimation) are much cleaner to implement in the "long" data form than the wide.
    • we could (for example) rewrite all models to use the "long" format (see data-format issue, below)
    • Or, we could throw a FeatureNotImplemented error if the user gives us an invalid combination of inputs
    • ultimately we will likely want to support varying-coef & time-varying effects (as well as other features) in all models
  2. Second problem is to figure out how to know whether the user has provided data in "long" or "wide" format. If they provide "wide" while the model requires long, we will want to convert to long using prep_data_long_surv. If they provide long & the model requires wide, we throw an error (unless all models are coded to take long data).
    • It is useful to allow the user to provide "long" data since they may have time-varying covariate values. The auto-convert utility doesn't accommodate these.
    • However, most of the time the user will likely provide wide data.
    • One option would be to have a longSurv(..) patsy function which, if given, would signal that the data are in long-format. Otherwise, we assume they are wide.
@jburos jburos added this to the v0.1.3 release milestone Mar 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant