autoencoder step #35

topepo · 2017-02-14T03:03:09Z

dimRed might solve this before I do

The text was updated successfully, but these errors were encountered:

gdkrmr · 2017-03-29T15:29:19Z

There is an autoencoder in pcaMethods (on Bioconductor), it is implemented in pure R and super slow if I remember right, that was the reason why I did not include it in the first release. The method is quite fancy, it can deal with missing values and sorts the axes according to variance. Can a CRAN package depend on a package from Bioconductor?

topepo · 2017-03-29T16:58:27Z

(CRAN)[https://cran.r-project.org/web/packages/policies.html#Source-packages] has

Packages on which a CRAN package depends should be available from a mainstream repository: if any mentioned in ‘Suggests’ or ‘Enhances’ fields are not from such a repository, where to obtain them at a repository should be specified in an ‘Additional_repositories’ field of the DESCRIPTION file (as a comma-separated list of repository URLs) or for other means of access, described in the ‘Description’ field.

There are a few that are also on CRAN; most are really slow and not actively developed. mxnet would probably be the method that I would do it. I looked at it last year and it didn't have a way to serialize the model to save (and reuse) in R. I should take another look.

terrytangyuan · 2017-03-29T19:48:56Z

There might be a lot of work to make canned autoencoders though, e.g. DNNAutoencoder, LSTMAutoencoder, etc. I would think that users will do this themselves since it requires a lot of customization.

gdkrmr · 2017-03-30T07:26:17Z

When I used autoencoders I always found it really hard to get them to converge and not over-fit, it takes a lot of tinkering with the internals.

Mxnet is probably the way to go, conceptually I really like the one in pcaMethods, because it has some really nice features which are not straightforward to implement.

http://mxnet.io/tutorials/r/classifyRealImageWithPretrainedModel.html
There is an mx.model.load function

topepo · 2017-04-14T03:11:40Z

@gdkrmr Which function in pcaMethods do you use? I see some cool imputation via PCA/SVD but not nonlinear autoencoders.

@terrytangyuan This issue has lists

Make the model object can be saved in RDS format

as a feature. Do you know if there is any progress on this?

terrytangyuan · 2017-04-14T05:42:30Z

I am not sure. It's been there for a while. @thirdwing might have a better idea.

gdkrmr · 2017-04-14T07:23:03Z

@topepo pcaMethods::nlpca; there is a reference on the help page which is very interesting to read.

topepo · 2017-04-14T13:53:31Z

nlpca doesn't appear to be able to project to a new data set with different number of rows:

> library(pcaMethods)
> 
> set.seed(1)
> in_train <- sample(1:150, 100)
> tr <- iris[ in_train, -5]
> te <- iris[-in_train, -5]
> 
> nlpca_obj <- pca(tr, nPcs=2, method="nlpca", maxSteps=500, verbose = FALSE)
> 
> head(fitted(nlpca_obj, tr))
         [,1]     [,2]     [,3]      [,4]
[1,] 5.050568 3.467380 1.425588 0.2393514
[2,] 5.795947 2.718434 4.372717 1.4699193
[3,] 5.588153 2.669189 4.398656 1.5534895
[4,] 6.368556 2.895570 4.956616 1.6964933
[5,] 4.718083 3.082611 1.499397 0.2413291
[6,] 7.356033 3.224832 5.930248 2.0810760
> 
> fitted(nlpca_obj, te)
Error in .Method(..., deparse.level = deparse.level) : 
  number of columns of matrices must match (see arg 2)

I'll submit an issue to that package's repo.

thirdwing · 2017-06-21T21:06:37Z

@terrytangyuan There is a blog on building autoencoder in R using MXNet in Japanese. I have invited the author to write a vignette.

gdkrmr · 2017-10-11T14:48:02Z

I wrote a, still very simple, autoencoder in dimRed you can find it in the develop branch. I use tensorflow as a backend. It would be great if I could get some feedback on this.

topepo · 2017-10-11T15:56:44Z

I looked at it and I'll run some examples later today. A few things that I thought of for the code:

Try using the keras api. I think that will be cleaner. I started writing a step for recipes and my code (using dropout and not weight decay) looks like this:

model <- keras_model_sequential()
model %>% 
  layer_dense(
    units = x$hidden, 
    activation = x$act_function, 
    input_shape = ncol(pred_data)
  ) %>%
  layer_dropout(rate = x$dropout,
                seed = sample.int(1000, 1)) %>%
  layer_dense(
    units = ncol(pred_data),
    activation = 'sigmoid'
  ) %>%
  keras::compile(
    loss = "mean_squared_error",
    optimizer = optimizer_rmsprop(),
    metrics = "mean_squared_error"
  )

The function might get called more than once in an R session, so you might want to have some code to reset the session when it is called (otherwise there can be memory accumulation issues). You mighty start the code with

K <- keras::backend()
K$clear_session()
# or `tf` equivalent

When writing the Rstudio tensorflow packages, @jjallaire avoids having examples in the man files since that requires CRAN to have the appropriate tensorflow installation. In his packages, he uses pkgdown/vignettes to show the examples since CRAN won't execute that code.

topepo · 2017-10-11T16:17:37Z

You will also have to serialize the resulting model so that it can be used in a new R/tf session. See rstudio/keras3#86.

Some example code is here and here

gdkrmr · 2017-10-12T07:51:57Z

super, thanks for the advice, I will look into it.

gdkrmr · 2017-10-12T15:06:37Z

further discussion here: gdkrmr/dimRed#12

gdkrmr · 2018-04-17T12:55:08Z

autoencoders are in master now -> closing

github-actions · 2021-02-26T00:02:29Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.

topepo added the new steps label Feb 14, 2017

topepo mentioned this issue Apr 14, 2017

projecting nlpca to new data sets hredestig/pcaMethods#2

Closed

gdkrmr closed this as completed Apr 17, 2018

github-actions bot locked and limited conversation to collaborators Feb 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autoencoder step #35

autoencoder step #35

topepo commented Feb 14, 2017

gdkrmr commented Mar 29, 2017

topepo commented Mar 29, 2017

terrytangyuan commented Mar 29, 2017

gdkrmr commented Mar 30, 2017

topepo commented Apr 14, 2017

terrytangyuan commented Apr 14, 2017

gdkrmr commented Apr 14, 2017

topepo commented Apr 14, 2017

thirdwing commented Jun 21, 2017

gdkrmr commented Oct 11, 2017 •

edited

Loading

topepo commented Oct 11, 2017

topepo commented Oct 11, 2017

gdkrmr commented Oct 12, 2017

gdkrmr commented Oct 12, 2017

gdkrmr commented Apr 17, 2018

github-actions bot commented Feb 26, 2021

autoencoder step #35

autoencoder step #35

Comments

topepo commented Feb 14, 2017

gdkrmr commented Mar 29, 2017

topepo commented Mar 29, 2017

terrytangyuan commented Mar 29, 2017

gdkrmr commented Mar 30, 2017

topepo commented Apr 14, 2017

terrytangyuan commented Apr 14, 2017

gdkrmr commented Apr 14, 2017

topepo commented Apr 14, 2017

thirdwing commented Jun 21, 2017

gdkrmr commented Oct 11, 2017 • edited Loading

topepo commented Oct 11, 2017

topepo commented Oct 11, 2017

gdkrmr commented Oct 12, 2017

gdkrmr commented Oct 12, 2017

gdkrmr commented Apr 17, 2018

github-actions bot commented Feb 26, 2021

gdkrmr commented Oct 11, 2017 •

edited

Loading