Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just started a run of play_char on my 8GB 2070. Had to drop batch size to 128. Getting ~2.2 iterations per second, so it looks like it's going to take two hours for training to finish. I don't expect my training to differ from karpathy's, but I'm curious to play around with the trained model a bit. Already ran the math one and got the ~same results.



In the committed notebook the final training loss for play_char was 0.30588. Yet my training got down to 0.02638. Odd. Either way the resulting model seems to be just as good/bad. Like char-rnn it's amazing to see it spell words from scratch consistently. It has a good grasp on structure and even a passable grasp on grammar. But also like char-rnn it lacks any ability to form coherent sentences.

EDIT: I'm running it on the IMDB dataset now ... just to see.


Running for two epochs on the IMDB dataset (133MB corpus) it only got to a loss of 1.1. Likely the regularization is too high (I didn't tweak the hyperparameters at all, and assume regularization was quite high for the limited tinyshakespeare corpus). Either way, it at least started to learn more grammar:

Prompt: This is my review of Lord of the Rings.

> I can't tell why the movie is a story with a lot of potential the main reason I want to see a movie that is Compared to the Baseball movie 10 Both and I can say it was not just a bad movie.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: