projecting nlpca to new data sets #2

topepo · 2017-04-14T13:58:50Z

I'd like to estimate an autoencoder from one data set and apply it to another with the same number of variables but with a different number of rows.

> library(pcaMethods)
> 
> set.seed(1)
> in_train <- sample(1:150, 100)
> tr <- iris[ in_train, -5]
> te <- iris[-in_train, -5]
> 
> nlpca_obj <- pca(tr, nPcs=2, method="nlpca", maxSteps=500, verbose = FALSE)
> 
> head(fitted(nlpca_obj, tr))
         [,1]     [,2]     [,3]      [,4]
[1,] 5.050568 3.467380 1.425588 0.2393514
[2,] 5.795947 2.718434 4.372717 1.4699193
[3,] 5.588153 2.669189 4.398656 1.5534895
[4,] 6.368556 2.895570 4.956616 1.6964933
[5,] 4.718083 3.082611 1.499397 0.2413291
[6,] 7.356033 3.224832 5.930248 2.0810760
> 
> fitted(nlpca_obj, te)
Error in .Method(..., deparse.level = deparse.level) : 
  number of columns of matrices must match (see arg 2)
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.4

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pcaMethods_1.66.0   Biobase_2.34.0      BiocGenerics_0.20.0

loaded via a namespace (and not attached):
[1] tools_3.3.3  Rcpp_0.12.10

I can't think of a analytical reason that this wouldn't work.

Thanks

(related to tidymodels/recipes#35)

hredestig · 2017-04-16T15:21:22Z

It is quite some time ago since I worked with this but I agree, I don't see a reason why this shouldn't be possible. The implementation doesn't allow for it since the fitted function is meant for exactly that, getting the fitted data to the training data, and not for new data. There is also a predict function for new data but not implemented for nonlinear PCA. This could probably be implemented but as you also note in the thread you reference, the nlpca is also extremely slow so I wonder if this is really the way to go anyway or if it wouldn't be better to do a more complete overhaul. Pull requests are welcome :)

gdkrmr · 2017-10-16T12:38:12Z

I just re-read the corresponding paper and there is a catch: I think nlPCA in pcaMethods only implements the decoder part of an autoencoder and optimizes the representation in reduced dimensions, therefore there is no easy way from data space to nl-PCA space and new points have to be optimized via gradient descent or a similar method.

hredestig · 2018-11-24T21:39:51Z

Indeed, not straight-forward.. Closing this one.

hredestig closed this as completed Nov 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

projecting nlpca to new data sets #2

projecting nlpca to new data sets #2

topepo commented Apr 14, 2017

hredestig commented Apr 16, 2017

gdkrmr commented Oct 16, 2017

hredestig commented Nov 24, 2018

projecting nlpca to new data sets #2

projecting nlpca to new data sets #2

Comments

topepo commented Apr 14, 2017

hredestig commented Apr 16, 2017

gdkrmr commented Oct 16, 2017

hredestig commented Nov 24, 2018