A Gentle Introduction to Artificial Neural Networks

The material in this post has been migrated to a post by the same name on my github pages website.

About dustinstansbury

I recently received my PhD from UC Berkeley where I studied computational neuroscience and machine learning.

Posted on September 11, 2014, in Classification, Gradient Descent, Machine Learning, Neural Networks, Neuroscience, Regression and tagged , , , , , . Bookmark the permalink. 12 Comments.

  1. Dustin, this intro to ANNs is outstanding! I’ve seen recent interest in rectified linear activation functions. Do you have any thoughts on these?

    • Thanks Ben, hopefully you find the post helpful.

      Yeah, ReLUs are pretty sweet! They exhibit some nice invariance properties that are useful for pattern recognition (for details, see work from Bengio’s group on rectifier nets). I often use the softrect/softplus function, an analytic approximation to the ReLU (in MATLAB-ish syntax):

      g(z) = 1/k.*log(1 + exp(k*z))
      g'(z) = 1./(1 + exp(-k*z)),

      where k is a hyperparameter that controls the smoothness of the rectification around zero.

  2. Indepth, informative yet simple! Very helpful, Thank you

  3. Reblogged this on Robotic Run and commented:
    Fantastic Introduction to ANN

  4. This is excellent; good work sir.

  5. Thank you for this excellently written post. In literally one sentence: “using linear activations for the output unit activation function (in conjunction with nonlinear activations for the hidden units) allows the network to perform nonlinear regression”, you’ve clarified an idea I’ve been grasping at but couldn’t get to earlier.

  6. This has greatly simplified this topic for me. Thank you so much

  7. Hi Dustin, can you please give me permission to use figure 1 (Figure 1: Diagram of a single-layered artificial neural network.) please for my research paper?

  8. Short and simple. Love it!

  9. Hi, is the “weight update” in the backpropagation algorithm figure “gradient descent”? Like, the first three steps are of backpropagation algorithm but then the weight update is performed with a gradient descent, right. So, if I am using Adam optimizer or something similar, then, the fourth step will not be the one in the figure but Adam’s own parameter update formulations?

    Thank you for this great introduction.

  1. Pingback: Distilled News | Data Analytics & R

  2. Pingback: A Gentle Introduction to Artificial Neural Networks – Robotic Run

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: