todo

my backprop performs better than this guys https://github.com/bsmithgall/coursera-ml/blob/master/ex4-003/mlclass-ex4/nnCostFunction.m
 and i dont know why. my guess is im including/excluding a bias node where i shouldn't be. ----- it could be because of random initialization? im getting exactly the same result every time though...

try ex3 without regularization. what does overfitting look like?

need to figure this out. by this i just mean where that partial derivative comes from https://www.reddit.com/r/learnmath/comments/5pt5qd/how_is_this_derivative_calculated_involved/
*see wk5 video3 backpropagation intuition. ...thats the function the derivative comes from

"derivative of the activation function"

but the activation function is sigmoid(a^n) which means you might be able to work it out like a regular function

derivative 1/(1+(1/e^(x)))   NO https://www.wolframalpha.com/input/?i=derivative+1%2F(1%2B(1%2Fe%5E(x)))

derivative 1/(1+(1/e^(X1 + X2 + X3))) NO https://www.wolframalpha.com/input/?i=derivative+1%2F(1%2B(1%2Fe%5E(X1+%2B+X2+%2B+X3)))


SERIOUS
Why dont you regularize the bias theta again??? - answer - because you can't overfit the bias parameter.

MAKE SURE YOU KNOW WHAT FEATURE SCALING IS.  - answer - (x-xMean)/(standardDeviation) ------ do that on all your features so that they are playing on the same scale as far as your learning algorithm is concerned. ----- https://www.coursera.org/learn/machine-learning/lecture/xx3Da/gradient-descent-in-practice-i-feature-scaling


recommender systems question
********What if they like everything? how is linear regression going to work? it's kind of assuming they like one thing as much as thye dont like another, isnt it?