Derivation: Error Backpropagation & Gradient Descent for Neural Networks

The material in this post has been migraged with python implementations to my github pages website.

About dustinstansbury

I recently received my PhD from UC Berkeley where I studied computational neuroscience and machine learning.

Posted on September 6, 2014, in Algorithms, Classification, Derivations, Gradient Descent, Machine Learning, Neural Networks, Optimization, Regression, Theory and tagged , , , , , . Bookmark the permalink. 18 Comments.

  1. Hi, this is the first write-up on backpropagation I actually understand. Thanks.

    A few possible bugs:
    1. Last part of Eq.8 should I think sum over a_i and not z_i.
    2. Between Eq.3 and Eq.4 it should I think be z_k=b_k + … and not z_k=b_j …
    3. Last section says Output layer bias while the derivation is for hidden layer bias. Also,
    b_i seems to be used as the notation for hidden layer bias while it should be b_j.

    All in all, a very helpful post.

  2. Reblogged this on DaFeda's Blog and commented:
    The easiest to follow derivation of backpropagation I’ve come across.

  3. Probably the best derivation of BackProp I’ve ever seen on internet 🙂

  4. Thanks. Nice clean explanation.

  5. Arnab Kanti Kar

    Thank you !
    Second time benefited from your blog ..

  6. Best introduction about back prop ever!
    Thank you so much.

  7. Really useful! Though there are a few typos, as daFeda has mentioned.

  8. really helpful man,

    I just have one small question Im hopeing somebody can answer..

    I understand this algebraically, and I understand the iterative patterns created with the deltas when calculating the weights from different layers starting backwards,

    but WHY does (a_k – t_k) * the derivative mean that the ERROR (which is equal to a_k – t_k) is being “BACK PROPAGATED”. What’s the intuition behind multiplying by the derivative which makes us saying this.
    ?

  9. This really cleared up all the confusions that I had in backpropagation. Thanks a bunch !

  10. Thank You ………… dustinstansbury. Finally, I understood back propagation.

  11. 25 years ago I had these formulae in my PhD, but I couldn’t retrieve a copy, luckily i found your blog (true story) and your very clear exposition refreshed my memory.

  12. A very neat and simple derivation. Great job!!

  13. Thanks, very interesting and helpful article. Great introduction to neural networks for beginners like me

  1. Pingback: Derivation: Derivatives for Common Neural Network Activation Functions | The Clever Machine

  2. Pingback: A Gentle Introduction to Artificial Neural Networks | The Clever Machine

  3. Pingback: Some sites I found helpful in reviewing backprop – Into DL and Beyond

  4. Pingback: Derivation: Error Backpropagation & Gradient Descent for Neural Networks – collection of dev articles

  5. Pingback: Derivazione: backpropagazione di errori e discesa del gradiente per reti neurali - Sem Seo 4 You

Leave a comment