# A Gentle Introduction to Artificial Neural Networks

The material in this post has been migrated to a post by the same name on my github pages website.

I recently received my PhD from UC Berkeley where I studied computational neuroscience and machine learning.

Posted on September 11, 2014, in Classification, Gradient Descent, Machine Learning, Neural Networks, Neuroscience, Regression and tagged , , , , , . Bookmark the permalink. 12 Comments.

1. Dustin, this intro to ANNs is outstanding! I’ve seen recent interest in rectified linear activation functions. Do you have any thoughts on these?

• Thanks Ben, hopefully you find the post helpful.

Yeah, ReLUs are pretty sweet! They exhibit some nice invariance properties that are useful for pattern recognition (for details, see work from Bengio’s group on rectifier nets). I often use the softrect/softplus function, an analytic approximation to the ReLU (in MATLAB-ish syntax):

g(z) = 1/k.*log(1 + exp(k*z))
g'(z) = 1./(1 + exp(-k*z)),

where k is a hyperparameter that controls the smoothness of the rectification around zero.

2. Indepth, informative yet simple! Very helpful, Thank you

3. Reblogged this on Robotic Run and commented:
Fantastic Introduction to ANN

4. Sam

This is excellent; good work sir.

5. Thank you for this excellently written post. In literally one sentence: “using linear activations for the output unit activation function (in conjunction with nonlinear activations for the hidden units) allows the network to perform nonlinear regression”, you’ve clarified an idea I’ve been grasping at but couldn’t get to earlier.

6. George Gillams

This has greatly simplified this topic for me. Thank you so much

7. Riva

Hi Dustin, can you please give me permission to use figure 1 (Figure 1: Diagram of a single-layered artificial neural network.) please for my research paper?

8. Igor

Short and simple. Love it!

9. Hi, is the “weight update” in the backpropagation algorithm figure “gradient descent”? Like, the first three steps are of backpropagation algorithm but then the weight update is performed with a gradient descent, right. So, if I am using Adam optimizer or something similar, then, the fourth step will not be the one in the figure but Adam’s own parameter update formulations?

Thank you for this great introduction.

1. Pingback: Distilled News | Data Analytics & R