Derivation: Error Backpropagation & Gradient Descent for Neural Networks

The material in this post has been migraged with python implementations to my github pages website.

About dustinstansbury

I recently received my PhD from UC Berkeley where I studied computational neuroscience and machine learning.

Posted on September 6, 2014, in Algorithms, Classification, Derivations, Gradient Descent, Machine Learning, Neural Networks, Optimization, Regression, Theory and tagged backprop derivation, backpropagation algorithm, backpropagation derivation, Derivation, Machine Learning, Neural Networks. Bookmark the permalink. 18 Comments.

Leave a comment
Trackbacks 5
Comments 13

daFeda | March 31, 2015 at 1:18 am

Hi, this is the first write-up on backpropagation I actually understand. Thanks.

A few possible bugs:
1. Last part of Eq.8 should I think sum over a_i and not z_i.
2. Between Eq.3 and Eq.4 it should I think be z_k=b_k + … and not z_k=b_j …
3. Last section says Output layer bias while the derivation is for hidden layer bias. Also,
b_i seems to be used as the notation for hidden layer bias while it should be b_j.

All in all, a very helpful post.

Reply
daFeda | March 31, 2015 at 1:19 am

Reblogged this on DaFeda's Blog and commented:
The easiest to follow derivation of backpropagation I’ve come across.

Reply
Ayan Das | July 4, 2015 at 9:46 am

Probably the best derivation of BackProp I’ve ever seen on internet 🙂

Reply
Devin | August 12, 2015 at 12:08 pm

Thanks. Nice clean explanation.

Reply
Arnab Kanti Kar | August 28, 2015 at 10:33 am

Thank you !
Second time benefited from your blog ..

Reply
Donghao Liu | February 17, 2016 at 5:45 pm

Best introduction about back prop ever!
Thank you so much.

Reply
mysticprince93 | January 27, 2017 at 6:51 am

Really useful! Though there are a few typos, as daFeda has mentioned.

Reply
unpracticalconsiderations | June 22, 2017 at 8:32 pm

really helpful man,

I just have one small question Im hopeing somebody can answer..

I understand this algebraically, and I understand the iterative patterns created with the deltas when calculating the weights from different layers starting backwards,

but WHY does (a_k – t_k) * the derivative mean that the ERROR (which is equal to a_k – t_k) is being “BACK PROPAGATED”. What’s the intuition behind multiplying by the derivative which makes us saying this.
?

Reply
Ugenteraan | July 31, 2017 at 12:34 pm

This really cleared up all the confusions that I had in backpropagation. Thanks a bunch !

Reply
Pradip Nichite | August 19, 2017 at 2:58 am

Thank You ………… dustinstansbury. Finally, I understood back propagation.

Reply
edmund | December 3, 2017 at 6:20 pm

25 years ago I had these formulae in my PhD, but I couldn’t retrieve a copy, luckily i found your blog (true story) and your very clear exposition refreshed my memory.

Reply
Anurag Reddy | December 31, 2019 at 9:03 pm

A very neat and simple derivation. Great job!!

Reply
rostys | June 30, 2020 at 5:34 pm

Thanks, very interesting and helpful article. Great introduction to neural networks for beginners like me

Reply

The OG Clever Machine

Topics in Computational Neuroscience & Machine Learning

Derivation: Error Backpropagation & Gradient Descent for Neural Networks

About dustinstansbury

Leave a comment

Trackbacks 5

Comments 13

Leave a comment Cancel reply

Follow TheCleverMachine

Recent Posts

Archives

Meta

The OG Clever Machine

Topics in Computational Neuroscience & Machine Learning

Derivation: Error Backpropagation & Gradient Descent for Neural Networks

Share this:

Related

About dustinstansbury

Leave a comment

Trackbacks 5

Comments 13

Leave a comment Cancel reply

Follow TheCleverMachine

Categories

Recent Posts

Archives

Meta