Thank you. Maybe it's not modest, but I do not consider myself to be a "general beginner". I majored in math and CS, so I'm thinking about getting something more hard-core and serious...
I second this. Given your background (lisper) maybe do the Little Learner book once it is out + Karpathy's video series. Follow it up by building a slightly complicated application in your favourite domain (text, images, videos, time series).
Also word of advise from my experience (I'm not an expert in DL either): Think of DL field as a game of lego blocks. The ideas in this book / Karpathy's videos are the basic lego blocks: parameterised linear functions, non-linearities, auto-grad, cross-entropy / KL divergence loss and gradient descent. Then there is entire body of more complex legos discovered simply by practice (alchemy!): transformer blocks, layer norm, max-pooling etc. It is impossible to understand how the second kind were obtained from first principles. The trick is to not beat yourself up about the advanced blocks too much but just play around, read up things in papers. Just focus on fundamental blocks.