Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

projecting nlpca to new data sets #2

Closed
topepo opened this issue Apr 14, 2017 · 3 comments
Closed

projecting nlpca to new data sets #2

topepo opened this issue Apr 14, 2017 · 3 comments

Comments

@topepo
Copy link

topepo commented Apr 14, 2017

I'd like to estimate an autoencoder from one data set and apply it to another with the same number of variables but with a different number of rows.

> library(pcaMethods)
> 
> set.seed(1)
> in_train <- sample(1:150, 100)
> tr <- iris[ in_train, -5]
> te <- iris[-in_train, -5]
> 
> nlpca_obj <- pca(tr, nPcs=2, method="nlpca", maxSteps=500, verbose = FALSE)
> 
> head(fitted(nlpca_obj, tr))
         [,1]     [,2]     [,3]      [,4]
[1,] 5.050568 3.467380 1.425588 0.2393514
[2,] 5.795947 2.718434 4.372717 1.4699193
[3,] 5.588153 2.669189 4.398656 1.5534895
[4,] 6.368556 2.895570 4.956616 1.6964933
[5,] 4.718083 3.082611 1.499397 0.2413291
[6,] 7.356033 3.224832 5.930248 2.0810760
> 
> fitted(nlpca_obj, te)
Error in .Method(..., deparse.level = deparse.level) : 
  number of columns of matrices must match (see arg 2)
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.4

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pcaMethods_1.66.0   Biobase_2.34.0      BiocGenerics_0.20.0

loaded via a namespace (and not attached):
[1] tools_3.3.3  Rcpp_0.12.10

I can't think of a analytical reason that this wouldn't work.

Thanks

(related to tidymodels/recipes#35)

@hredestig
Copy link
Owner

It is quite some time ago since I worked with this but I agree, I don't see a reason why this shouldn't be possible. The implementation doesn't allow for it since the fitted function is meant for exactly that, getting the fitted data to the training data, and not for new data. There is also a predict function for new data but not implemented for nonlinear PCA. This could probably be implemented but as you also note in the thread you reference, the nlpca is also extremely slow so I wonder if this is really the way to go anyway or if it wouldn't be better to do a more complete overhaul. Pull requests are welcome :)

@gdkrmr
Copy link

gdkrmr commented Oct 16, 2017

I just re-read the corresponding paper and there is a catch: I think nlPCA in pcaMethods only implements the decoder part of an autoencoder and optimizes the representation in reduced dimensions, therefore there is no easy way from data space to nl-PCA space and new points have to be optimized via gradient descent or a similar method.

@hredestig
Copy link
Owner

Indeed, not straight-forward.. Closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants