# About the Author

Hi! My Name is Dustin Stansbury. I am a Staff Data Scientist at Quizlet, an online learning platform. There I work solve problems that require statistics, machine learning, NLP, software architecture and engineering, knowledge bases, and whatever else it takes to help students learn.

Note: The Clever Machine has moved! All new material will be posted to the new site, hosted on GitHub pages, where I will continue to provide similar content.

Hi Dustin: Your explanation of MCMC using Hamiltonian dynamics is

the nicest, kindest, gentlest explanation that I have found and I have looked hard and was extremely confused until I read your explanation. I have to read it a few times for sure but I just have one question if you don’t mind. Could you explain how you get the original relations for partial x with respect to t and partial p with respect to t. You write both of them as a function of partial derivative of H and I don’t understand that step. Thank you very much for any enlightment and also for your beautiful hamiltonian explanation

.

Mark

Glad to be of help. As for the expressions dx/dt = dH/dp, and dp/dt = -dH/dx, those are simply the Hamilton’s characteristic equations. I didn’t come up with them, they are commonly used in physics to describe the dynamics of a closed system. Hamilton’s equations can be derived from the differential of the Langrangian for a closed system, but the derivation is somewhat involved. There is a nice example on Wikipedia, if you’re interested: http://en.wikipedia.org/wiki/Hamiltonian_mechanics

Hi, i really liked ur post on markov chains. However, i would also like to understand continuous time markov chains. If you can suggest any good help on its implementation in matlab it would be really great.

Your site is a fantastic resource — it’s good for me to remember that in addition to the vicious crap out there there is also treasure like this. Thanks so much for contributing to the signal.

Hi Dustin,

I would like to thank you and congratulate you about this amazing web. I think you are doing a very nice thing in here.

best

This is truly amazing work.

Your blog post is best resource I had ever read.your explanation are so in depth covering topic completely.

Your blog post is best resource that I had ever read.I liked your in depth explanation on the topics.Thanks for sharing your knowledge.

Hi Dustin,

Your explanation of error back-propagation and gradient descent for neural networks was really nice. Could you also explain the

same in presence of Convolution layer with max-pooling stage.

So the overall architecture looks like this-

Input – Convolution – MaxPool – Hidden – output

Best,

Luhar

Hi Dustin,

Good descriptions of complicated things here. Nice work. Regarding backprop, would you have any insight into how the attached approach (page 9) can be correct? I am particularly curious about the error term for the top layer, and why there is no gradient g'(z3) multiplying the top layer errors (a(3)-y). Using numerical derivatives, the approach seems correct and the method converges. But I’m lost on why.

Thanks in advance for any insight you may offer.

Best,

Martin