About the Author

Hi! My Name is Dustin Stansbury. I am a Staff Data Scientist at Quizlet, an online learning platform. There I work solve problems that require statistics, machine learning, NLP, software architecture and engineering, knowledge bases, and whatever else it takes to help students learn.

Note: The Clever Machine has moved! All new material will be posted to the new site, hosted on GitHub pages, where I will continue to provide similar content.

  1. Hi Dustin: Your explanation of MCMC using Hamiltonian dynamics is
    the nicest, kindest, gentlest explanation that I have found and I have looked hard and was extremely confused until I read your explanation. I have to read it a few times for sure but I just have one question if you don’t mind. Could you explain how you get the original relations for partial x with respect to t and partial p with respect to t. You write both of them as a function of partial derivative of H and I don’t understand that step. Thank you very much for any enlightment and also for your beautiful hamiltonian explanation

    • Glad to be of help. As for the expressions dx/dt = dH/dp, and dp/dt = -dH/dx, those are simply the Hamilton’s characteristic equations. I didn’t come up with them, they are commonly used in physics to describe the dynamics of a closed system. Hamilton’s equations can be derived from the differential of the Langrangian for a closed system, but the derivation is somewhat involved. There is a nice example on Wikipedia, if you’re interested: http://en.wikipedia.org/wiki/Hamiltonian_mechanics

  2. Hi, i really liked ur post on markov chains. However, i would also like to understand continuous time markov chains. If you can suggest any good help on its implementation in matlab it would be really great.

  3. Your site is a fantastic resource — it’s good for me to remember that in addition to the vicious crap out there there is also treasure like this. Thanks so much for contributing to the signal.

  4. Hi Dustin,
    I would like to thank you and congratulate you about this amazing web. I think you are doing a very nice thing in here.

  5. Patrick Gourdet

    This is truly amazing work.

  6. Your blog post is best resource I had ever read.your explanation are so in depth covering topic completely.

  7. Your blog post is best resource that I had ever read.I liked your in depth explanation on the topics.Thanks for sharing your knowledge.

  8. Hi Dustin,

    Your explanation of error back-propagation and gradient descent for neural networks was really nice. Could you also explain the
    same in presence of Convolution layer with max-pooling stage.
    So the overall architecture looks like this-

    Input – Convolution – MaxPool – Hidden – output


  9. Hi Dustin,

    Good descriptions of complicated things here. Nice work. Regarding backprop, would you have any insight into how the attached approach (page 9) can be correct? I am particularly curious about the error term for the top layer, and why there is no gradient g'(z3) multiplying the top layer errors (a(3)-y). Using numerical derivatives, the approach seems correct and the method converges. But I’m lost on why.

    Click to access ex4.pdf

    Thanks in advance for any insight you may offer.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: