Hi! My Name is Dustin Stansbury. I recently received my PhD from UC Berkeley in Vision Science under the supervision of Jack Gallant. My academic research interests involve, but are not limited to:

• Neural Computation
• Machine Learning (particularly neural networks)
• Natural Scene Statistics
• Visual Perception
• Computer Vision
• Signal Processing
• Neuroimaging (e.g. Functional Magnetic Resonance Imaging (fMRI))

1. mark leeds

Hi Dustin: Your explanation of MCMC using Hamiltonian dynamics is
the nicest, kindest, gentlest explanation that I have found and I have looked hard and was extremely confused until I read your explanation. I have to read it a few times for sure but I just have one question if you don’t mind. Could you explain how you get the original relations for partial x with respect to t and partial p with respect to t. You write both of them as a function of partial derivative of H and I don’t understand that step. Thank you very much for any enlightment and also for your beautiful hamiltonian explanation
.
Mark

• Glad to be of help. As for the expressions dx/dt = dH/dp, and dp/dt = -dH/dx, those are simply the Hamilton’s characteristic equations. I didn’t come up with them, they are commonly used in physics to describe the dynamics of a closed system. Hamilton’s equations can be derived from the differential of the Langrangian for a closed system, but the derivation is somewhat involved. There is a nice example on Wikipedia, if you’re interested: http://en.wikipedia.org/wiki/Hamiltonian_mechanics

2. pooja

Hi, i really liked ur post on markov chains. However, i would also like to understand continuous time markov chains. If you can suggest any good help on its implementation in matlab it would be really great.

3. Shane

Your site is a fantastic resource — it’s good for me to remember that in addition to the vicious crap out there there is also treasure like this. Thanks so much for contributing to the signal.

4. Hi Dustin,
I would like to thank you and congratulate you about this amazing web. I think you are doing a very nice thing in here.
best

5. Patrick Gourdet

This is truly amazing work.

8. Luhar

Hi Dustin,

Your explanation of error back-propagation and gradient descent for neural networks was really nice. Could you also explain the
same in presence of Convolution layer with max-pooling stage.
So the overall architecture looks like this-

Input – Convolution – MaxPool – Hidden – output

Best,
Luhar

9. Martin

Hi Dustin,

Good descriptions of complicated things here. Nice work. Regarding backprop, would you have any insight into how the attached approach (page 9) can be correct? I am particularly curious about the error term for the top layer, and why there is no gradient g'(z3) multiplying the top layer errors (a(3)-y). Using numerical derivatives, the approach seems correct and the method converges. But I’m lost on why.

https://github.com/benoitvallon/coursera-machine-learning/blob/master/machine-learning-ex4/ex4.pdf

Thanks in advance for any insight you may offer.

Best,
Martin